/* This is the dummy variable example given in class */ ** In this example we show that the two sample t-test ** and the regression based test that regresses a dummy ** variable on average income yields identical results. ** ** Steps: ** 1. Create a dummy variable that takes the value 1 if income in ** the district is above average and zero otherwise. ** . summarize avginc ** . generate highinc = 0 ** . replace highinc if avginc > 15.31 ** 2. Do the two sample t-test. To get identical results we assume ** that the variances in the two subsamples are the same. ** . ttest avginc, by(highinc) ** 3. Run the regression. ** . regress avginc highinc ** ** Output from the STATA log file follows. */ log: C:\Temp\public_html\class\4213\f2004\dummy.smcl log type: smcl opened on: 13 Oct 2004, 15:19:27 . use "C:\Gauss\Papers\probit\caschool.dta", clear . summarize avginc Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- avginc | 420 15.31659 7.22589 5.335 55.328 . gen highinc = 0 . replace highinc = 1 if avginc > 15.31 (152 real changes made) . ttest avginc, by(highinc) Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- 0 | 268 11.46686 .1433048 2.346001 11.18471 11.74901 1 | 152 22.10426 .6410858 7.903837 20.83761 23.37092 ---------+-------------------------------------------------------------------- combined | 420 15.31659 .3525873 7.22589 14.62353 16.00965 ---------+-------------------------------------------------------------------- diff | -10.6374 .518575 -11.65674 -9.618062 ------------------------------------------------------------------------------ Degrees of freedom: 418 Ho: mean(0) - mean(1) = diff = 0 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 t = -20.5128 t = -20.5128 t = -20.5128 P < t = 0.0000 P > |t| = 0.0000 P > t = 1.0000 . regress avginc highinc Source | SS df MS Number of obs = 420 -------------+------------------------------ F( 1, 418) = 420.77 Model | 10974.8903 1 10974.8903 Prob > F = 0.0000 Residual | 10902.559 418 26.0826771 R-squared = 0.5017 -------------+------------------------------ Adj R-squared = 0.5005 Total | 21877.4493 419 52.2134829 Root MSE = 5.1071 ------------------------------------------------------------------------------ avginc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- highinc | 10.6374 .518575 20.51 0.000 9.618062 11.65674 _cons | 11.46686 .311967 36.76 0.000 10.85364 12.08008 ------------------------------------------------------------------------------ . log close log: C:\Temp\public_html\class\4213\f2004\dummy.smcl log type: smcl closed on: 13 Oct 2004, 15:19:57 -------------------------------------------------------------------------------