e-TA 15: Censored Regression Models
Welcome to a new issue of e-Tutorial. This e-TA will focus on Censored Regression Models, with special emphasis on helping answer question 4 of PS5. 1
Data
You can download the data set, called weco14.txt from the Econ 508 web site. Save it in your preferred directory.
Then you can load it in R. If you saved it in your hard drive in a folder called econ508 you can load it by typing:
weco<-read.table("C:/econ508/weco14.txt", header=T, sep="")
or you can retrieve it from the web
weco<-read.table("http://www.econ.uiuc.edu/~econ508/data/weco14.txt", header=T, sep="")
head(weco)
y sex dex lex kwit job_tenure status
1 13.72579 0 38 10 FALSE 277 TRUE
2 17.15373 1 55 11 TRUE 173 TRUE
3 13.62992 1 45 12 FALSE 410 TRUE
4 13.03998 1 41 11 FALSE 247 TRUE
5 13.19510 1 42 10 FALSE 340 TRUE
6 16.43010 0 38 12 FALSE 517 TRUE
Heckman two-step procedure
To estimate the equation of productivity, using only non-quitters. To do so we need to use the Heckman two-step procedure following Lecture 21. But first we need to crate a dummy variable that identifies non quitters, and run a probit regression:
weco$lex2<-weco$lex^2
weco$kwit<-as.numeric(weco$kwit)
weco$nonkwit<-ifelse(weco$kwit==1,0,1)
head(weco)
y sex dex lex kwit job_tenure status lex2 nonkwit
1 13.72579 0 38 10 0 277 TRUE 100 1
2 17.15373 1 55 11 1 173 TRUE 121 0
3 13.62992 1 45 12 0 410 TRUE 144 1
4 13.03998 1 41 11 0 247 TRUE 121 1
5 13.19510 1 42 10 0 340 TRUE 100 1
6 16.43010 0 38 12 0 517 TRUE 144 1
After we have all the variables we follow the “recipe” in Lecture 21
- Estimate binary choice model by probit
probit<-glm(nonkwit~sex+dex+lex+lex2, data=weco, family=binomial(link="probit"))
summary(probit)
Call:
glm(formula = nonkwit ~ sex + dex + lex + lex2, family = binomial(link = "probit"),
data = weco)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.4622 0.3041 0.5762 0.7578 1.3164
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.698385 2.431400 -3.578 0.000347 ***
sex -0.271552 0.113076 -2.401 0.016328 *
dex 0.058008 0.008255 7.027 2.12e-12 ***
lex 1.155336 0.382383 3.021 0.002516 **
lex2 -0.047018 0.015253 -3.083 0.002052 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 745.97 on 682 degrees of freedom
Residual deviance: 679.38 on 678 degrees of freedom
AIC: 689.38
Number of Fisher Scoring iterations: 4
- Construct \(\hat{\lambda_i} = \frac{\phi(x'_i\gamma)}{\Phi(x'_i\gamma)}\)
weco$lambda <- dnorm(cbind(1,weco$sex, weco$dex,weco$lex,weco$lex2)%*%(probit$coef))/pnorm(cbind(1,weco$sex, weco$dex,weco$lex,weco$lex2)%*%(probit$coef))
- Re estimate original model using only \(y_i > 0\) observations but including \(\hat{\lambda_i}\) as additional explanatory variable
weco_nk<-weco[which(weco$nonkwit==1),]
lm<-lm(y~sex+dex+lex+lex2+lambda, data=weco_nk)
summary(lm)
Call:
lm(formula = y ~ sex + dex + lex + lex2 + lambda, data = weco_nk)
Residuals:
Min 1Q Median 3Q Max
-3.7638 -0.7006 -0.0036 0.7645 3.2609
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.912068 8.058899 1.602 0.10972
sex -0.691889 0.215042 -3.217 0.00137 **
dex 0.074247 0.039285 1.890 0.05933 .
lex -0.092592 0.982103 -0.094 0.92492
lex2 0.003787 0.040014 0.095 0.92464
lambda -1.623275 1.652550 -0.982 0.32642
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.154 on 516 degrees of freedom
Multiple R-squared: 0.3514, Adjusted R-squared: 0.3451
F-statistic: 55.9 on 5 and 516 DF, p-value: < 2.2e-16
Then you can test for sample selectivity problems by checking the significance of \(\hat{\lambda_i}\), as remarked in Lecture 21. Please indicate what model you should use after all, based on the sample selectivity test.
Powell’s estimator
As pointed out by Lecture 21 a problem with the Gaussian MLE is that it can perform poorly in non-Gaussian and/or heteroscedastic circumstances. In that case we could use Powell estimator which can be achieved in R by using the crq.fit.pow
in the quantreg
package:
require(quantreg)
Loading required package: quantreg
Loading required package: SparseM
Attaching package: 'SparseM'
The following object is masked from 'package:base':
backsolve
powell<- crq.fit.pow(cbind(1,weco$sex, weco$dex,weco$lex,weco$lex2),weco$y,weco$nonkwit,left=TRUE)
Warning in rq.fit.br(x, y, tau): Solution may be nonunique
powell$coef
[1] 4.92366053 -0.89661622 0.11193125 0.92806584 -0.04043428
Note that under certain conditions this works for any \(F\) even if there is heteroskedasticity.
Please send comments to bottan2@illinois.edu or srmntbr2@illinois.edu↩