------------------------------------------------------------------------------- log: I:\Web\LearnEconometrics\data\stata\heckit.log log type: text opened on: 16 Apr 2008, 14:40:02 First step is to take a peek at the data. Notice that wage=0 are basically missing observations. There is no need to limit the regression to the positive value of wages in this case. . summarize wage educ age Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- wage | 1343 23.69217 6.305374 5.88497 45.80979 education | 2000 13.084 3.045912 10 20 age | 2000 36.208 8.28656 20 59 . webuse womenwk, clear First, run a regression using least squares all available observations. . regress wage educ age Source | SS df MS Number of obs = 1343 -------------+------------------------------ F( 2, 1340) = 227.49 Model | 13524.0337 2 6762.01687 Prob > F = 0.0000 Residual | 39830.8609 1340 29.7245231 R-squared = 0.2535 -------------+------------------------------ Adj R-squared = 0.2524 Total | 53354.8946 1342 39.7577456 Root MSE = 5.452 ------------------------------------------------------------------------------ wage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- education | .8965829 .0498061 18.00 0.000 .7988765 .9942893 age | .1465739 .0187135 7.83 0.000 .109863 .1832848 _cons | 6.084875 .8896182 6.84 0.000 4.339679 7.830071 ------------------------------------------------------------------------------ Now, estimate the model assuming sample selection (mle). Selection is determined by marital status, educ, children and age. Notice that Stata reparameterizes rho and sigma to make the algorithm work better. The imputed values of rho and sigma appear at the bottom of the results table. Note athrho=(1/2)ln((1+rho)/(1-rho)). Stata also tests the hypothesis that rho is zero and rejects this at 5% level. . heckman wage educ age, select(married children educ age) Iteration 0: log likelihood = -5178.7009 Iteration 1: log likelihood = -5178.3049 Iteration 2: log likelihood = -5178.3045 Heckman selection model Number of obs = 2000 (regression model with sample selection) Censored obs = 657 Uncensored obs = 1343 Wald chi2(2) = 508.44 Log likelihood = -5178.304 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- wage | education | .9899537 .0532565 18.59 0.000 .8855729 1.094334 age | .2131294 .0206031 10.34 0.000 .1727481 .2535108 _cons | .4857752 1.077037 0.45 0.652 -1.625179 2.59673 -------------+---------------------------------------------------------------- select | married | .4451721 .0673954 6.61 0.000 .3130794 .5772647 children | .4387068 .0277828 15.79 0.000 .3842534 .4931601 education | .0557318 .0107349 5.19 0.000 .0346917 .0767718 age | .0365098 .0041533 8.79 0.000 .0283694 .0446502 _cons | -2.491015 .1893402 -13.16 0.000 -2.862115 -2.119915 -------------+---------------------------------------------------------------- /athrho | .8742086 .1014225 8.62 0.000 .6754241 1.072993 /lnsigma | 1.792559 .027598 64.95 0.000 1.738468 1.84665 -------------+---------------------------------------------------------------- rho | .7035061 .0512264 .5885365 .7905862 sigma | 6.004797 .1657202 5.68862 6.338548 lambda | 4.224412 .3992265 3.441942 5.006881 ------------------------------------------------------------------------------ LR test of indep. eqns. (rho = 0): chi2(1) = 61.20 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ Compare the mle to the two-step estimator. Not much changes in this case. Note, when lambda is significant, this is evidence that rho is not zero and that sample selection cannot safely be ignored. . heckman wage educ age, select(married children educ age) twostep Heckman selection model -- two-step estimates Number of obs = 2000 (regression model with sample selection) Censored obs = 657 Uncensored obs = 1343 Wald chi2(4) = 551.37 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- wage | education | .9825259 .0538821 18.23 0.000 .8769189 1.088133 age | .2118695 .0220511 9.61 0.000 .1686502 .2550888 _cons | .7340391 1.248331 0.59 0.557 -1.712645 3.180723 -------------+---------------------------------------------------------------- select | married | .4308575 .074208 5.81 0.000 .2854125 .5763025 children | .4473249 .0287417 15.56 0.000 .3909922 .5036576 education | .0583645 .0109742 5.32 0.000 .0368555 .0798735 age | .0347211 .0042293 8.21 0.000 .0264318 .0430105 _cons | -2.467365 .1925635 -12.81 0.000 -2.844782 -2.089948 -------------+---------------------------------------------------------------- mills | lambda | 4.001615 .6065388 6.60 0.000 2.812821 5.19041 -------------+---------------------------------------------------------------- rho | 0.67284 sigma | 5.9473529 lambda | 4.0016155 .6065388 ------------------------------------------------------------------------------ . log close log: I:\Web\LearnEconometrics\data\stata\heckit.log log type: text closed on: 16 Apr 2008, 14:41:19 -------------------------------------------------------------------------------