e-TA 17: Survival Analysis
Welcome to a new issue of e-Tutorial. This e-TA will focus on on Duration Models (a.k.a. Survival Analysis) in the context of the PS5. 1
Data
You can download the data set, called weco14.txt from the Econ 508 web site. Save it in your preferred directory.
Then you can load it in R. If you saved it in your hard drive in a folder called econ508 you can load it by typing:
weco<-read.table("C:/econ508/weco14.txt", header=T, sep="")
or you can retrieve it from the web
weco<-read.table("http://www.econ.uiuc.edu/~econ508/data/weco14.txt", header=T, sep="")
head(weco)
y sex dex lex kwit job_tenure status
1 13.72579 0 38 10 FALSE 277 TRUE
2 17.15373 1 55 11 TRUE 173 TRUE
3 13.62992 1 45 12 FALSE 410 TRUE
4 13.03998 1 41 11 FALSE 247 TRUE
5 13.19510 1 42 10 FALSE 340 TRUE
6 16.43010 0 38 12 FALSE 517 TRUE
Next generate the variables needed:
weco$lex2<-weco$lex^2
Survival Analysis
Kaplan-Meier
To do this kind of analysis in R
we are going to use the package survival
require(survival)
Loading required package: survival
Loading required package: splines
Then we need to identify the “analysis time” variable, and the “failure” variable. The former indicates the duration of the process, while the latter indicates whether the data is censored. In the PS5 data set, “tenure” represents the “analysis-time” variable, i.e., the duration of the process, while “status” represents the “failure” variable, assuming values of 0 if it is censored, and 1 if it is failure.
Initially we need to generate the Kaplan-Meier estimator for men and women:
fit <- survfit(Surv(job_tenure,status)~sex, data=weco)
plot(fit, lty=c(1,3), xlab="Time", ylab="Survival Probability")
legend(2000,0.7, legend=c("Female", "Male"), lty=c(1,3) )
Next you may want to test formally for differences between this groups. To do so you can use the survdiff
function:
survdiff(Surv(job_tenure,status)~sex, data=weco)
Call:
survdiff(formula = Surv(job_tenure, status) ~ sex, data = weco)
N Observed Expected (O-E)^2/E (O-E)^2/V
sex=0 296 240 279 5.38 10.6
sex=1 387 332 293 5.12 10.6
Chisq= 10.6 on 1 degrees of freedom, p= 0.00111
Cox proportional hazard model
Next the PS asks for the estimation of a Cox proportional hazard model. You can estimate such model as follows:
coxph<-coxph(Surv(job_tenure,status)~sex+dex+lex+lex2, data=weco)
coxph
Call:
coxph(formula = Surv(job_tenure, status) ~ sex + dex + lex +
lex2, data = weco)
coef exp(coef) se(coef) z p
sex 0.5444 1.724 0.08819 6.17 6.7e-10
dex -0.0920 0.912 0.00668 -13.77 0.0e+00
lex -1.1586 0.314 0.32682 -3.54 3.9e-04
lex2 0.0464 1.047 0.01317 3.52 4.2e-04
Likelihood ratio test=217 on 4 df, p=0 n= 683, number of events= 572
And plot the results by calling:
plot(survfit(coxph))
Please send comments to bottan2@illinois.edu or srmntbr2@illinois.edu↩