Welcome to a new issue of e-Tutorial. Here we will talk about the basic fundamentals of panel data estimation techniques: from the organization of your panel data sets to the tests of fixed effects versus random effects. In the example below we will use the theoretical background of Prof. Koenker’s Lecture Note 17 to reproduce the results of Greene (1997). ¹

Data

The first thing you need is to download Greene’s (1997) panel data set, called greene14.txt from the Econ 508 web site. Save it in your preferred directory. This is a small panel data set with information on costs and output of 6 different firms, in 4 different periods of time (1955, 1960,1965, and 1970). Your job is try to estimate a cost function using basic panel data techniques.

The next step is loading the Data in Stata.

  insheet using greene14.txt, clear

Next, we want to transform variables into logs (usually you don’t need to, but it will facilitate the use of panel functions later).

  gen lnc=log(cost) 
  gen lny=log(output)

Finally, we will declare the panel structure of the data:

  xtset firm year

Pooled OLS

Consider a simplified version of the equation (1) in Koenker’s Lecture 17:

\[ y_{it} = x_{it} \beta + a_{i} + u_{it} \; \; \; \;(1) \]

The most basic estimator of panel data sets are the Pooled OLS (POLS). Johnston & DiNardo (1997) recall that the POLS estimators ignore the panel structure of the data, treat observations as being serially uncorrelated for a given individual, with homoscedastic errors across individuals and time periods:

\[ \beta_{POLS} = (X'X)^{-1} X'y \; \; \; \;(2) \]

In Stata you estimate it doing:

  reg lnc lny

      Source |       SS       df       MS              Number of obs =      24
-------------+------------------------------           F(  1,    22) =  728.51
       Model |   33.617333     1   33.617333           Prob > F      =  0.0000
    Residual |  1.01520396    22  .046145635           R-squared     =  0.9707
-------------+------------------------------           Adj R-squared =  0.9694
       Total |  34.6325369    23  1.50576248           Root MSE      =  .21482

------------------------------------------------------------------------------
         lnc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         lny |   .8879868   .0328996    26.99   0.000     .8197573    .9562164
       _cons |  -4.174783   .2768684   -15.08   0.000    -4.748973   -3.600593
------------------------------------------------------------------------------

Fixed Effects (Within-Groups) Estimators:

In Prof. Koenker’s Lecture 17 we examined the effects of applying the matrix P and Q to the data, where \[ P = D(D'D)^{-1} D' \] transform data into individual means and

\[ Q = I-P \]

transform data into deviation from individual means.

The within-groups (or fixed effects) estimator is then given by:

\[ \beta_{Whitin} = (X'QX)^{-1} X'Qy \; \; \; \;(3) \]

Given that Q is idempotent, this is equivalent to regressing Qy on QX, i.e., using data in the form of deviations from individuals means. You can obtain the within-groups estimator using the built-in function xtreg, fe:

   xtreg lnc lny, fe

Fixed-effects (within) regression               Number of obs      =        24
Group variable: firm                            Number of groups   =         6

R-sq:  within  = 0.8774                         Obs per group: min =         4
       between = 0.9833                                        avg =       4.0
       overall = 0.9707                                        max =         4

                                                F(1,17)            =    121.66
corr(u_i, Xb)  = 0.8495                         Prob > F           =    0.0000

------------------------------------------------------------------------------
         lnc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         lny |   .6742789   .0611307    11.03   0.000     .5453044    .8032534
       _cons |  -2.399009    .508593    -4.72   0.000    -3.472046   -1.325972
-------------+----------------------------------------------------------------
     sigma_u |  .36730483
     sigma_e |  .12463167
         rho |  .89675322   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0:     F(5, 17) =     9.67               Prob > F = 0.0002

Note that the intercept above is an average of individual intercepts. You can also try calculating it by hand:

   egen lnc_bar = mean(lnc), by(firm)
   egen lny_bar = mean(lny), by(firm)
   gen lnc_we = lnc-lnc_bar
   gen lny_we = lny-lny_bar
   reg lnc_we lny_we

Between-Groups Estimators:

Another useful estimator is provided when you use only the group means, i.e., transforming your data by applying the matrix P to equation (1) above:

\[ \beta_{Between} = [X'PX]^{-1} X'Py \; \; \; \;(4) \]

In Stata

  xtreg lnc lny, be

Between regression (regression on group means)  Number of obs      =        24
Group variable: firm                            Number of groups   =         6

R-sq:  within  = 0.8774                         Obs per group: min =         4
       between = 0.9833                                        avg =       4.0
       overall = 0.9707                                        max =         4

                                                F(1,4)             =    236.23
sd(u_i + avg(e_i.))=  .1838474                  Prob > F           =    0.0001

------------------------------------------------------------------------------
         lnc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         lny |   .9110734   .0592772    15.37   0.000     .7464935    1.075653
       _cons |  -4.366618   .4982409    -8.76   0.001    -5.749957   -2.983279
------------------------------------------------------------------------------

or, by hand:

   reg lnc_bar lny_bar

Random Effects:

Following Prof. Koenker’s Lecture 17, consider \(a_{i}'s\) as random. So, the model can be estimated via GLS:

\[ \beta_{GLS} = [X' \Omega^{-1} X]^{-1}X'\Omega^{-1} y \; \; \; \; (5) \]

where \(\Omega = (\sigma_{u} ^{2}*I_{nT} + T \sigma_{\alpha} ^{2} P)\)

Estimating this with Stata is straight forward:

   xtreg lnc lny, re

Random-effects GLS regression                   Number of obs      =        24
Group variable: firm                            Number of groups   =         6

R-sq:  within  = 0.8774                         Obs per group: min =         4
       between = 0.9833                                        avg =       4.0
       overall = 0.9707                                        max =         4

                                                Wald chi2(1)       =    268.10
corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000

------------------------------------------------------------------------------
         lnc |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         lny |   .7963203   .0486336    16.37   0.000     .7010002    .8916404
       _cons |  -3.413094   .4131166    -8.26   0.000    -4.222788     -2.6034
-------------+----------------------------------------------------------------
     sigma_u |  .17296414
     sigma_e |  .12463167
         rho |  .65823599   (fraction of variance due to u_i)
------------------------------------------------------------------------------

An alternative approach would be using Nerlove’s Lemma in Lecture Notes 17 where we transform the model to obtain spherical errors. For example we transform \(y\) by

\[ \sigma_{u} \Omega^{-1/2} y = (\theta P + Q) y = y -(1- \theta) \bar{y} \]

where \(\theta=\frac{\sigma_{u}}{(\sigma_{u} ^2 + T \sigma_{\alpha} ^2)^{1/2}}\) and similarly for the other variables. In Stata:

   qui: xtreg lnc lny, fe
   matrix bW=e(b) 
   matrix VW=e(V)
   qui: xtreg lnc lny, be
   matrix bB=e(b)
   matrix VB=e(V)

   matrix V=VW+VB 
   matrix Vinv=syminv(V) 
   matrix D=VW*Vinv 
   matrix P1=D*bB' 
   matrix I2=I(2) 
   matrix RD=I2-D 
   matrix P2=RD*bW' 
   matrix bRE=P1+P2 
   matrix list bRE


bRE[2,1]
              y1
  lny  .79632032
_cons  -3.413094

What should I use: Fixed Effects or Random Effects? A Hausman (1978) Test Approach

Hausman (1978) suggested a test to check whether the individual effects (\(a_{i}\)) are correlated with the regressors (\(X_{it}\)):

Under the Null Hypothesis: Orthogonality, i.e., no correlation between individual effects and explanatory variables. Both random effects and fixed effects estimators are consistent, but the random effects estimator is efficient, while fixed effects is not.
Under the Alternative Hypothesis: Individual effects are correlated with the X’s. In this case, random effects estimator is inconsistent, while fixed effects estimator is consistent and efficient.

Greene (1997) recalls that, under the null, the estimates should not differ systematically. Thus, the test will be based on a contrast vector H:

\[ H = [\beta_{GLS} - \beta_{W}]'[V(\beta_{W})-V(\beta_{GLS})]^{-1} [\beta_{GLS} - \beta_{W}] ~ \chi_(k)^{2} \; \; \; \; (6) \]

where k is the number of regressors in X (excluding constant). In Stata:

   xtreg lnc lny, fe 
   estimates store fe 
   xtreg lnc lny, re 
   estimates store re
   hausman fe re

                 ---- Coefficients ----
             |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
             |       fe           re         Difference          S.E.
-------------+----------------------------------------------------------------
         lny |    .6742789     .7963203       -.1220414        .0370369
------------------------------------------------------------------------------
                           b = consistent under Ho and Ha; obtained from xtreg
            B = inconsistent under Ha, efficient under Ho; obtained from xtreg

    Test:  Ho:  difference in coefficients not systematic

                  chi2(1) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                          =       10.86
                Prob>chi2 =      0.0010

So, based on the test above, we can see that the tests statistic 10.86 is greater than the critical value of a Chi-squared (1df, 5%) = 3.84. Therefore, we reject the null hypothesis. Given such result, the preferred model is the fixed effects.

Appendix: Recovering Alfas from Fixed Effects (Least Squares Dummy Variables)

Suppose you are interested in to obtain a specific regression for firm 3. E.g., many international economists need to find a country-specific equation when they are dealing with country panels. If you are in this situation, don’t worry. The fixed effects estimators are already taking into account all individual effects. The only mysterious thing happening is that such individual intercepts are not being shown in the regression output.

You can recover the intercept of your cross-sectional unit after using fixed effects estimators. For the example above, let’s calculate the fixed effects model including dummy variables for each firm, instead of a common intercept (some authors call this Lest Squares Dummy Variables, but it is the same fixed effects you saw earlier). In Stata:

   reg lnc lny d1 d2 d3 d4 d5 d6, nocons


      Source |       SS       df       MS              Number of obs =      24
-------------+------------------------------           F(  7,    17) = 2581.72
       Model |  280.714267     7  40.1020382           Prob > F      =  0.0000
    Residual |  .264061918    17  .015533054           R-squared     =  0.9991
-------------+------------------------------           Adj R-squared =  0.9987
       Total |  280.978329    24  11.7074304           Root MSE      =  .12463

------------------------------------------------------------------------------
         lnc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         lny |   .6742789   .0611307    11.03   0.000     .5453044    .8032534
          d1 |  -2.693527   .3827874    -7.04   0.000    -3.501137   -1.885916
          d2 |  -2.911731   .4395755    -6.62   0.000    -3.839154   -1.984308
          d3 |  -2.439957   .5286852    -4.62   0.000    -3.555386   -1.324529
          d4 |  -2.134488   .5587981    -3.82   0.001    -3.313449    -.955527
          d5 |  -2.310839     .55325    -4.18   0.001    -3.478094   -1.143583
          d6 |  -1.903512   .6080806    -3.13   0.006     -3.18645   -.6205737
------------------------------------------------------------------------------

The slope is obviously the same. The only change is the substitution of a common intercept for 6 dummies, each of them representing a cross-sectional unit. Now suppose you would like to know if the difference in the firms effects is statistically significant. How to do that?

Regress the fixed effects estimators above, including the intercept and the dummies:

   reg lnc lny d1 d2 d3 d4 d5 d6


note: d1 omitted because of collinearity

      Source |       SS       df       MS              Number of obs =      24
-------------+------------------------------           F(  6,    17) =  368.77
       Model |   34.368475     6  5.72807917           Prob > F      =  0.0000
    Residual |  .264061918    17  .015533054           R-squared     =  0.9924
-------------+------------------------------           Adj R-squared =  0.9897
       Total |  34.6325369    23  1.50576248           Root MSE      =  .12463

------------------------------------------------------------------------------
         lnc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         lny |   .6742789   .0611307    11.03   0.000     .5453044    .8032534
          d1 |          0  (omitted)
          d2 |  -.2182041   .1052027    -2.07   0.054    -.4401624    .0037542
          d3 |   .2535693   .1716665     1.48   0.158    -.1086153    .6157539
          d4 |   .5590387   .1982915     2.82   0.012     .1406801    .9773973
          d5 |   .3826881   .1933058     1.98   0.064    -.0251516    .7905277
          d6 |   .7900151   .2436915     3.24   0.005      .275871    1.304159
       _cons |  -2.693527   .3827874    -7.04   0.000    -3.501137   -1.885916
------------------------------------------------------------------------------
F-statistic:  369 on 6 and 17 DF,  p-value: <2e-16

Note that one of the dummies is dropped (due to perfect collinearity of the constant), and all other dummies are represented as the difference between their original value and the constant . (The value of the constant in this second regression equals the value of the dropped dummy in the previous regression. The dropped dummy is seen as the benchmark.)

Obtain the R-squared from restricted (POLS) and unrestricted (fixed effects with dummies) models

   qui: reg lnc lny
   scalar R2OLS=e(r2)
   qui: reg lnc lny d1 d2 d3 d4 d5 d6
   scalar R2LSDV = e(r2)

Perform the traditional F-test, comparing the unrestricted regression with the restricted regression:

\[ F_{(n-1, nT-n-K)}=\frac{[ (R_{u} ^2 - R_{p} ^2) / (n-1) ]}{[ (1 - R_{u} ^2) / (nT - n - k) ]} \; \; \; \;(7) \]

where the subscript “u” refers to the unrestricted regression (fixed effects with dummies), and the subscript “p” to the restricted regression (POLS). Under the null hypothesis, POLS are more efficient.

   scalar F=((R2LSDV-R2OLS)/(6-1))/((1-R2LSDV)/(24-6-1)) 
   scalar list F

      F =  9.6715307

The result above can be compared with the critical value of F(5,17), which equals 4.34 at 1% level. Therefore, we reject the null hypothesis of common intercept for all firms.

References:

Greene, William, 1997, Econometric Analysis, Third Edition, NJ: Prentice-Hall. Hausman, Jerry, 1978, “Specification Tests in Econometrics,” Econometrica, 46, pp.1251-1271. Johnston, Jack, and John DiNardo, 1997, Econometric Methods, Fourth Edition, NY: McGraw-Hill. Koenker, Roger, 2004, “Panel Data,” Lecture 13, mimeo, University of Illinois at Urbana-Champaign.

Please send comments to bottan2@illinois.edu or srmntbr2@illinois.edu ↩

Contact	Office Hours	E-mail
Prof. Roger Koenker	M. & W. 2:30-3:30 or by appointment (126 DKH)	rkoenker@illinois.edu
TA Nicolas Bottan	TBA	bottan2@illinois.edu

Applied Econometrics
Econ 508 - Fall 2014

Professor: Roger Koenker

TA: Nicolas Bottan

e-TA 10: Panel Data: Basics

Data

Pooled OLS

Fixed Effects (Within-Groups) Estimators:

Between-Groups Estimators:

Random Effects:

What should I use: Fixed Effects or Random Effects? A Hausman (1978) Test Approach

Appendix: Recovering Alfas from Fixed Effects (Least Squares Dummy Variables)

References:

Contact

Office Hours

E-mail

Applied Econometrics Econ 508 - Fall 2014

Professor: Roger Koenker

TA: Nicolas Bottan

e-TA 10: Panel Data: Basics

Data

Pooled OLS

Fixed Effects (Within-Groups) Estimators:

Between-Groups Estimators:

Random Effects:

What should I use: Fixed Effects or Random Effects? A Hausman (1978) Test Approach

Appendix: Recovering Alfas from Fixed Effects (Least Squares Dummy Variables)

References:

Contact

Office Hours

E-mail

Applied Econometrics
Econ 508 - Fall 2014