RSS for Researchers

I would like to introduce RSS feeds and RSS readers, which I believe are very useful research tools in the internet era (RSS entry @ Wikipedia).

The problem first. I only subscribe a few journals personally. To browse latest articles published in many other journals related to my research areas, I need to check the corresponding websites regularly. To check the latest issues from ten different journals, I need to visit ten different webpages. Alternatively, I can subscribe the email notification services, which have been available for a long time for many journals. However, each email is itself one object, with several entries (articles) inside. It is not easy to organize the entries. How about combining the lists of articles together into one single list?

Put it simply, each RSS feed is a list of entries provided by a server, e.g., news stories provided by a newspaper website, or article abstracts provided by a journal publisher. An RSS reader "pulls" the data from the RSS feed providers, and shows them to the user. You can consider an RSS Reader as your personalized magazine, with data from various magazines, journals, newspapers, websites, and other content providers, all collected and displayed in one place.

With an RSS reader, I only need to read my reader daily, or weekly, or whatever schedule I like. The latest issues will appear when they are available. No need to have a long list of websites for all journals I want to scan, and no need to remember or check the release schedule of each journal. Moreover, RSS readers usually allow the users to tag (label) and save an entry. Therefore, I can keep and file articles that I am interested in for later access. This is difficult to do in email notifications.

As you may notice in the diagram I prepared, it is not only for obtaining the latest issue tables of content from journals. Nowadays, many content providers have RSS feeds. Many societies have RSS feeds for their latest news, many news websites have feeds for their news stories, many discussion groups have feeds for the latest posts, and feed is nearly one of the standard services a blog should have. Therefore, RSS reader is actually a one-stop personalized news service for you to combine news that you want to keep track of.

A quick online search will return many online how-to guides on using RSS feeds and RSS readers (also known as RSS aggregators). I think the best way to learn how to use RSS feeds is to learn how to use an RSS reader. I myself have tried various RSS readers, and found two of them suit my needs. One is the older version of FeedDemon (not 3.0), which needs to be downloaded and installed on a computer. The other one is Google Reader, which is an online RSS reader that I can access in any computer with internet access. They may not be the best, so you need to try yourself to see which one suits your own needs. Google has an interesting video that illustrates how to use Google Reader, which I think also illustrates the idea of RSS feeds in general:

To illustrate how I use RSS reader for research, these are some sites with feeds I need:
Next time you visit a website, look for the RSS or XML icons. There are fewer and fewer websites that do not provide an RSS feed. :)

Update 2014-03-23: Google Reader has been shut down. I have not yet found an online service as good as Google Reader is to me. :(  Any suggestion?

Article: "Potential Problems in the Statistical Control of Variables ..." by Becker 2005

Article: Becker, T. E. (2005). Potential problems in the statistical control of variables in organizational research: A qualitative analysis with recommendations. Organizational Research Methods, 8, 274-289. [Abstract]

In psychological studies, it is common to include variables as "control variables," for example, age, gender, educational level, and other similar variables, usually demographic variables but sometimes other variables specific to a particular research context. The researchers believe that the effects of these variables should be "controlled for" before investigating the predictive power of the variables of interest. This is a practice so common that, sometimes we (including me) just think this should be done, without really asking ourselves why.

Becker reviewed a random sample of 60 articles from four top journals, and summarized the problems found in the common practice of using control variables. Among the various problems highlighted, I think the most important one is the lacking of explanation. If the control variables are correlated with the proposed predictors but are entered first, we are letting the control variables to "claim" the predictive power shared by the control variables and the predictors. The question is whether attributing this shared effect to the control variables is theoretically justified.

For example, consider this hypothetical situation. Assume we have two predictors of salary, intelligence and educational level attained. Usually, we will include educational level as a control variable. However, what if educational level is actually (at least partly) influenced by intelligence, that is, higher the intelligence, higher the educational level attained? If this is the case, then the path is intelligence->educational level->salary. According to the view on hierarchical regression analysis by Cohen, Cohen, West, and Aiken (2003), predictors entered in a subsequent step must not be a cause of predictors of variables entered in previous steps. If we adopt this perspective, intelligence should be entered first, not educational level.

Alternatively, we can understand this hypothetical case as a mediation model. The R-square in the first step, with intelligence only, is the total effect of intelligence (direct effect on salary plus indirect effect through educational level). If educational level is actually influenced by intelligence but is entered first, the R-square in the first step is not the total effect of educational level, as it is confounded by the effect of intelligence.

Certainly, Cohen et al.'s perspective is not the only one in using hierarchical regression analysis. For example, sometimes we need to enter several variables first because they are well-established predictors, and we need to demonstrate that the predictors we propose have additional contribution over the well-established predictors. In this case, we are not asserting a causal order for the well-established predictors and the predictors we propose. The order is more a practical one, partly determined by what predictors happen to be studied first.

Nevertheless, Becker reminds us that the inclusion of control variables and the order of entry, like all other variables in the regression analysis, should be justified theoretically.

However, the normative pressure is strong, even in the research community. Anyway, I think it does not hurt if all we need is to think more. :)


Cohen, J., Cohen, P., West, S. G.,& Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences. NJ: Lawrence Erlbaum Associates.


Update 2014-04-12: I no longer use CiteULike. I now use Mendeley. It has a local client such that I can access the database offline. I don't use the file sync service and so the free plan suits my needs very well.

One frequent task in doing research is searching journal articles. In the old days without electronic copies and easy-to-use database applications, we relied on file cabinets, hanging folders, index cards, etc. to file the numerous articles that accumulate. Recently, I found that there is a free (as of 2008-06-30) online service that is very useful in the era of internet: CiteULike. According to the FAQ of the website, "CiteULike is a free service to help you to store, organise and share the scholarly papers you are reading." I am a new user, and have not yet decided whether I will use it as my main platform to file articles in the future. Nevertheless, I would like to share my initial experience here, so other visitors may try and see if it is good for them or not. As the FAQ from the website is very detailed, instead of repeating the information here, I would like to share how I am using the service.

Nowadays, nearly all major academic databases are accessible online with subscription. Moreover, most major publishers have their own websites and online tables of content for their journals. With CiteULike, when I find a useful article in the web browser, I can post it to my CiteULike personal database (library), and tag it with my own keywords. (For users of, CiteULike is for journal articles.) If the database or publisher website supports CiteULike, then the basic information, such as title, abstract, authors and journal, will be entered automatically. Even if the website does not support CiteULike, it is not very difficult to enter (copy and paste) the information online manually. I can then search my library for articles using the keywords. If several articles are related to the same project, I may also create a keyword for this project and tag all articles related to this project with the project keyword, creating a bibliography for this particular project.

Another important aspect of CiteULike is social. I can find articles other users posted, and may also find other users who have posted an article that I posted to my library. In my search, I have successfully found some articles posted by other users and useful to my projects, by reading the libraries of other users who posted those articles in my library and are critical to a particular topic. I would have missed those articles, as they are related to the topic but not in my discipline. Conversely, I can also share my library, and let other people find articles I collected for a topic.

Despite the social aspect of CiteULike, a user can, for whatever reasons, mark an article as private. It is still in the personal library, but only the user will see this entry. Therefore, even if a user does not like the idea of letting others know what articles are being cataloged, CiteULike still serves very well as an online private "secret" library. For me, I will use CiteULike as both a private and public databases, as occasionally there are technical reasons that some lists are for private use, or not yet ready for public sharing.

There are certainly other well-developed alternatives, such as EndNote and RefWorks. I myself also have been using a self-created Microsoft Access databases for nearly ten years, and likely will keep using it. Nevertheless, CiteULike is easy to use, accessible anywhere, and free. Even my self-created Access database is not really free, as I need to write and debug the codes myself, which can be quite time consuming.

I think I will still use my personal Access database for a long time due to the number of entries I have accumulated over the years. For CiteULike, I will use it mainly for sharing articles that I collected for selected topics. That is why the number of articles in my library accumulated in the last few months is very small. I will continue to try CiteULike for several months, to see if it really suits my need, and whether I will migrate to CiteULike and use it as my main platform for cataloging articles.