Research Paper in Introductory Econometrics
This material is replicated on a number of sites as part of the SERC Pedagogic Service Project
Through this independent research project, students experience the process of doing real economics research using appropriate econometric methods.
- Develop an understanding of how economists conduct applied research.This means more than simply learning the statistical methods. In order to use the methods appropriately, students must know the underlying theory as well as the existing literature on the issue.
- Develop important (marketable) computer skills. To handle the large data sets and complex econometric techniques several specialized software packages have emerged in the market. The program used in this class SAS. It is one of the most widely-used statistical programming languages in the world. While some work can be done with minimal knowledge of SAS coding, it is important for students to learn the basics of SAS syntax and logic to be an efficient econometrician.
- Develop the ability to critically evaluate others' research.
- Develop written and oral communication skills.
Context for Use
Applied research papers in econometrics classes are common across the discipline. Some supply datasets to students to use to replicate "famous" results, while others require students to collect their own data. I use replication as a first project in the course (midterm project) and require an independent research project as the capstone learning experience for the course.
Description and Teaching Materials
The research paper serves as a capstone to the course. Students find their own topic, research the literature, collect data and use appropriate econometric techniques to analyze the topic. To facilitate the process, student are required to submit a proposal(Acrobat (PDF) 264kB Jan29 10) for their paper. This happens immediately after fall break, at the mid-point of the semester. By that time, students have a good grasp of multiple regression, including basic modeling issues like log transformations, scaling, etc.
Teaching Notes and Tips
The key is scaffolding the 6-week process so that students end with an econometrically rigorous and (relatively) complete paper. Of course, there is no way these kinds of papers can meet the level of thoroughness that you would expect out of a semester- or year-long independent research project. The instructor has to make deliberate decisions on where students should and should not devote their scarce time.
One area I sacrifice in is the literature. This is not a thesis and does not require a full-blown literature review. Having said that, students do need to have read at least a handful (6-8 is a reasonable expectation) of papers on the topic. I should note that many of these papers are related to other term papers students have written or are writing in their upper-level electives. For example, a student writing a paper about the literature on the gender-wage gap for a labor class will already have an extensive knowledge of the literature. The implication here is that there are spillovers from other classes that make use of to make this project successful. What they will not have done in that class, however, is to have done a full-blown, rigorous econometric study. Many of the papers I see in econometrics are like this.
The most challenging part is to get them to develop enough of a theory so that they can make the appropriate econometric decisions. For example, if they are looking at the price of beach-front housing on the coast, they need to understand and explain the appropriateness of the Hedonic model and its assumptions in the context of this market to have addressed the question of simultaneity. Otherwise, students are doing little more than an "applied regression" paper (a statistics project vs. an econometrics project).
Furthermore, since the papers are individualized, each topic and dataset will present its own unique set of econometric challenges. These include (1) multicollinearity, (2) incorrect functional form, (3) heteroscedasticity, (4) autocorrelation, (5) omitted variables, (6) measurement error and (7) simultaneity. Students are expected to address the relevant problems in a satisfactory way. The challenge is to get them to think about their data and theoretical problems early on so that it is not merely an exercise in data-mining.
To help with that, I have developed a series of short homework assignments to (1) keep them on task and (2) lead them to address the requisite issues that are addressed in the course:
- After their proposals are approved, I require a 2-page written summary where they discuss each independent variable theoretically. They explain from theory the effects it should have on the dependent variable and why. In addition, I ask them to pay close attention to two things: (1) whether the variable is endogenous and why (or, if it is exogenous they must justify that); (2) whether the theoretical relationship is linear or non-linear. I also ask them to sketch the XY scatter plot from a theoretical point of view (remember: they have not collected the data yet).
- Following that, their data are due. They are required to come to class with the data imported into SAS. I check their data one-by-one and we discuss issues of dummy variables, transformations, etc. The data are generally due 10-14 days after the proposal. This leaves 4 weeks in the semester for them to complete the econometric work and write the paper.
The final challenge has to do with the timing of content. As they are doing their papers over the last 4 weeks, we are covering topics such as limited dependent variables and panel data. I end the delivery of new content the before Thanksgiving, leaving 2 weeks of class time for them to work on their projects in class and have an in-class final exam (I use the final exam period for presentations). This timing means that students doing topics using, say, logistic regression, do not have that knowledge until 2 weeks left in class. Thus, I have all students begin with benchmark OLS regression model. A lot of diagnostics can be done at this stage, even if OLS is inefficient due to the non-linearities. For example, multicollinearity can be dealt with in OLS.
Since this is an econometrics class, assessment of the papers are biased towards the course objectives. As with all papers, of course, I do expect them to be well-written and complete. But those criteria are treated more as "minus" if they are not up-to-par rather than something that will make the difference between, say and A and a B. Thus, grades are determined by:
- whether the economic theory and the specification of the benchmark econometric model are consistent;
- the extent to which the student has correctly diagnosed the relevant econometric problems;
- the extent to which the student has dealt with the econometric problems in an appropriate and convincing way;
- the extent to which the paper is well-written and complete (e.g., is there a reasonable introduction with a clearly defined thesis? has the student done a reasonable amount of literature review for a semester project? has the student written a reflective discussion of the results?)
References and Resources
Damodar Gujarati. Essentials of Econometrics. 4th Edition, New York: McGraw Hill.
Steven A. Greenlaw, Doing Economics, Houghton Mifflin.
See more Undergraduate Research »
Final Paper AssignmentThe basic purpose of this course is to prepare students to carry out their own econometric study. Students will be asked to formulate an original econometric model, collect data relevant to the model, use econometric techniques to estimate the model, and interpret the results of the estimation. Econometrics is best learned by actually doing an econometric study. Only then will the "uninitiated" learn the power as well as the pitfalls of econometrics. This handout will outline the steps to writing an econometrics paper.
I. The Model
The model and the data are the starting points of an econometric project. The first step in formulating a model is to select a topic of interest and to consider the model's scope and purpose. In particular thought should be given to the objectives of the study, what boundaries to place on the topic, what hypotheses might be tested, what variables might be predicted, and what policies might be evaluated. Close attention must be paid, however, to the availability of adequate data. In particular the model must involve causal relations among measurable variables.
The topic selected can be economic or noneconomic. It could be a particular market (the market for Pitzer graduates, the market for economists, the market for ice cream, the markets for private education), a process (economic development, inflation, unemployment), demographic phenomena (birth rates, death rates), environmental phenomena (water quality, air quality), political phenomena (elections, voting behavior of legislatures), some combination of these, or some other topic.
You are free to choose the topic of your choice. Former students have written papers on a wide variety of subjects. Some paper titles are presented below:
"Air pollution and Population"
"Birth Rates, Death Rates, and Economic Growth in Developing Economies"
"Demand for and Supply of Higher Education"
"Differential Growth in U.S. Cities"
"Discrimination in the Retail Food Markets"
"Divorce Rates, Birth Rates, and Female Participation in the Labor Force"
"Economic and Social Determinants of Infant Mortality in the United States"
"The Effect of Unemployment on Crime"
"Elections and Money"
"Medical School Applications"
"Police Expenditures and the Deterrence of Crime"
"The Relationship between Exports and Growth in Less Developed Countries"
"Unionization and Strike Activities"
These papers are generally interested in the impact of some independent variable X on a dependent variable Y. But since there are many variables X that have influence on the variable Y, it is important to include all those variables on the right hand side of the equation.
To ensure that the model is both interesting and manageable, it should contain at least three to four independent variables on the right hand side. The model should be formulated as an algebraic, linear, stochastic equation along with a corresponding verbal statement of the meaning of the equation. The expected signs of all the coefficients should be considered. All relevant multipliers, short-run and long-run, should be identified and considered.
Remember that these ideas above are merely examples of reasonable topics. You should be original and follow your own interests. Perhaps the best choice of a topic is one in which you have prior experience or knowledge. Did you take a course on economic development or do you like to watch basketball games? You will have a head start in these areas because you are already familiar with the basic issues. If you feel particularly uninspired, take a look at Bernt, The Practice of Econometrics, Addison-Wesley, 1991. In any case, you will have to identify and study the previous literature on the subject. Good sources are professors, EconLit, and the Honnold Library On Line Catalog. The relevant literature should indicate, or at least suggest, a model and also hypotheses to be tested, variables to be forecast, and/or policies to be evaluated. It can also be a useful guide to the relevant data.
II. The Data
Data form an essential ingredient in any econometric study, and obtaining an adequate and relevant set of data is an important and often critical part of the econometric project. Data must be available for all the variables in the model.
National Statistical Abstracts, Statistical Yearbooks, or Statistical Handbooks, published annually by most major countries provide both summary statistics and references to primary sources. For the United States, the best starting point for the acquisition of relevant data is the Statistical Abstract of the United States which is published annually.
The appendix to the annual Economic Report of the President contains information on fewer variables than the Statistical Abstract, but has a longer times series for these variables. It includes series on income, employment and production. The U.S. Department of Commerce, Bureau of Economic Analysis publishes the Survey of Current Business each month. Business Statistics, the biennial supplement to the Survey provides historical data and methodological notes for approximately 2,100 series. Depending on the series, the data are published on a monthly, quarterly, and or annual basis. Some series are seasonally adjusted. Numerous private agencies also collect economic data. Economagic.com provides access to over 200,000 economic times series. The Conference Board collects data on several economic variables, as does the Institute of Social Research at the University of Michigan.
For financial data, there are several primary sources. The Center for Research in Securities Prices (CRSP) dataset contains data on market prices and quarterly dividends for every firm listed on the New York Stock Exchange (NYSE) since 1926. The ILS dataset, produced by Interactive Data Corporation (IDC), contains daily stock-trading volume, prices, quarterly dividends, and earnings for all NYSE and AMEX securities, and some OTC securities. The Compustat dataset, produced by Investors Management Sciences, Inc. (IMSI), contain over 20 years of annual data for more than 3,500 stocks.
For international data, the United Nations Statistical Yearbook provides a wealth of data on member countries, as do statistical yearbooks of other international organizations like the OECD. The Federal Reserve Bank of St. Louis puts out International Economic Conditions which gives comparative data for Canada, France, Germany, Italy, Japan, Netherlands, Switzerland, United Kingdom, and the U.S. Various almanacs, sources on the WWW like www.census.gov, and other reference works also abound in statistics. Take a look at the course homepage and the economics department homepage. All of these sources contain data on so many topics that they may suggest a topic for the econometric project. You should also talk to librarians and other professors and just keep your eyes open.
Data can be either time-series or cross-section. For this project it is probably best not to pool data of the two types. Also it is best to avoid data sets which are too small, say less than thirty observations. The data should be examined, and if necessary, refined to make them suitable for the purposes of the model. For time-series data it may be necessary to use seasonal adjustments or perhaps to eliminate certain trends. For both time-series and cross-section consideration should be given to whether to divide the data into separate samples or perhaps exclude certain observations. Thus in time-series data it may (or may not) be appropriate to exclude war years or years of a recession. In a cross-section of nations it may be inappropriate to include all countries that are UN members. The developed countries might be treated as one group and the developing countries as another group. Dividing the data this way into subsamples not only leads to more homogenous data sets but also facilitates the study by allowing comparative analyses.
III. The Estimation
After both the model and data have been developed, the next step is to utilize econometric techniques to estimate the model. Your final paper is expected to use multiple regression analysis to estimate your multivariate model and test relevant hypotheses. You can use STATA 14 or any other statistical package for the statistical analysis. Basic statistical packages include Minitab and Excel. For careful work in econometrics you will want to use EViews, STATA, SAS, TSP, LimDep, SPSS or Shazam. For this project it is best if the dependent variable is a quantitative variable. Do make sure that you have enough observations for all the variables and that the dependent and independent variables show some variation over the observations. You should not be estimating any identities, or using the dependent variable on the right hand side of the equation unless it is lagged.
IV. The Write-Up
The paper should be approximately 10-15 pages in length. If it is much shorter, it should be very good. If it is much longer, it should be very important. Unless there are reasons for doing otherwise, the best style to use in the final write-up of the econometric project is that of an article in a scholarly journal, a style that is both clear and brief, though never sacrificing clarity for the sake of brevity. The following outline is suggested for your paper:
I. Title Page
Discuss the nature and objectives of the topic, provide a general description of the scope of the model, and the hypotheses to be tested and/or policies to be evaluated. Here you should motivate your paper by explaining why the issues you are studying are important.
III. Review of Previous Literature
Discuss the approaches and results of previous studies of this topic or related topics. Explain why your paper is better than the previous literature.
IV. Specification of the Model
Define and discuss the specification of your model. What variables are included in the model? Explain why you chose those variables and the role they play in the model. Have you included all the important variables in the model? What are the expected signs of all the coefficients? Explain the stochastic and other assumptions being made in the model.
V. Data Description
Provide complete descriptions of all the data, their sources, refinements used, and their possible biases or other possible weaknesses.
Present the estimates of the model and its related statistics such as standard errors, t statistics and the R2. Discuss which coefficients are significant at the 5% and 1% levels. If relevant, a discussion of possible serial correlation and its correction; a discussion of possible heteroscedasticity and its correction; and a discussion of possible multicollinearity and its correction. Estimate alternative models to test the robustness of the results.
Discuss the signs and magnitudes of the estimated coefficients and their comparisons to predicted or theoretical signs and magnitudes. What have we learned? Consider how the model might be reformulated in future studies, and implications for future econometric research.
Sum up the major results of your study.
Include complete citations of all items referred to in the paper.
If reasonable, provide a table of all the data used. At a minimum, provide the summary statistics for the data.
Honnold Library carries several journals which specialize in applied economic research like the American Economic Review, Journal of Political Economy, International Economic Review, Industrial and Labor Relations Review, and the Journal of Business and Economics. The Quarterly Review, Economic Review or the Business Review of the various regional Federal Reserve Banks also contain good applied economic research. Since most of you have not read an econometrics paper before, you should take a look at some of these journals.
And finally, the writing style, if you can call it that, of economists differs from that of historians, journalists and non-economists in general. You might take a look at The Writing of Economics by Donald N. McCloskey, or a shorter version titled "Economical Writing" which appeared in Economic Inquiry, April 1985 for some editorial guidelines regarding appropriate writing style.
At any time, I will be happy to assist you in completing the project, but we must all remember that it is your project. The responsibility for picking the topic, clarifying the issues, gathering the evidence, and doing the analysis is yours. I will help you to refine your ideas, to discover and circumvent any research pitfalls you may encounter, to put the finishing touches on your research design and to express your ideas more coherently, but I will not deny you the joy of discovering that you can do this kind of independent research.
The econometrics paper for this course will be developed through four phases during the semester.
Phase 1: Write a 2-3 page essay which poses a research question from any field of economics and develops a strategy for answering that question using regression analysis. The strategy will identify the dependent variable, set of explanatory variables, and the type of data required. This essay will serve as the student's proposal for the semester project. This first essay is due on Wednesday 27 September.
Phase 2: Write a 5-7 page essay which identifies at least two papers published in academic journals or as part of a working paper series that use regression analysis to answer the specific research question of the author's choosing. For example, the papers might both investigate the factors that contribute to economic growth in developing countries. Your essay will review and critique these studies. In particular, your essay will identify the theoretical propositions tested in the papers, identify the dependent and independent variables, describe the data, and discuss any econometric problems and possible solutions. This second essay is due on Wednesday 18 October.
Phase 3: Write a 3-5 page essay which reports the results of your regression analysis. The essay should identify a specific research question, describe the data used to answer that question, present the results, and describe the empirical problems and methods used to correct those problems. After the first submission, students will present their results in class and obtain feedback from fellow students. This third essay is first due on Wednesday 8 November.
Phase 4: Write a 10-15 page research paper which incorporates the edited material from the earlier three essays. This final paper is due by 4pm on Friday 8 December.