1.You have obtained a sub-sample of 1744 individuals from the Current Population Survey (CPS) and are interested in the relationship between weekly earnings and age. The regression yielded the following result:
= 239.16 + 5.20Age, R2 = 0.05, SER = 287.21 (42.71) (2.71)
where Earnings and Age are measured in dollars and years respectively.
a.Interpret the intercept.
b.Interpret the coefficient on Age.
c.The average age in this sample is 37.5 years. What are average weekly earnings in the sample?
d.At a significance level of 5%, we would say that the slope coefficient is not statistically significantly different from zero. Calculate the relevant confidence interval to show this.
e.Consider a 45-year-old in the sample with weekly earnings of $1,000. Calculate the residual for this person. Explain what your answer means in words.
g.Give an example of a factor that may be causing omitted variable bias. Based on the direction of the bias, after adding the omitted variable, will the coefficient on Age in this multiple regression be > 5.20 or < 5.20?
2.As we all know, correlation does not imply causation. On other words, if X and Y are correlated, it doesn’t necessarily mean that X causes Y; there are two other possible explanations. What are they? Take two of these three explanations for the correlation between X and Y and, using an example, show that these two explanations do not have to be mutually exclusive. In other words, give an example of an X and Y that are related in, simultaneously, two different ways.
3.You have estimated an earnings function, where you regressed the log of earnings on two binary variables, one for gender and the other for marital status.
a)Write down the regression equation such that the intercept corresponds to a single male, without allowing for interaction between marital status and gender.
b)According to your regression equation from part (a), what are the average log earnings of:
c)Allow for an interaction between the gender and marital status binary variables. Write down the new regression equation.
d)Repeat the exercise from part (b) for the new regression:
e)Explain what including the interaction term allows that was not captured in the original regression.
4.It has been conjectured that workplace smoking bans induce smokers to quit by reducing their opportunities to smoke. In this assignment you will estimate the effect of workplace smoking bans on smoking using data on a sample of 10,000 U.S. indoor workers from 1991 to 1993, available on the textbook website www.pearsonhighered.com/stock_watson in the filed smoking. The data set contains information on whether individuals were or were not subject to a workplace smoking ban, whether the individuals smoked, and other individual characteristics. A detailed description is given in Smoking_Description, available on the website. For the following questions, use any software of your choice, but make sure you include the relevant table showing your results. You will receive zero credits if your regression tables are not included.
a)Estimate the probability of smoking by using smoking ban as your independent variable. Attach your regression table.
b)Estimate a linear probability model with smoker as the dependent variable and the following regressors: smkban, female, age, age2, hsdrop, hsgrad, colsome, colgrad, black, and hispanic. (attach your regression table). Compare the estimated effect of a smoking ban from this regression with
your answer in part (a).
c)What does the coefficient on smkban suggest?
d)Is the coefficient on smoking ban statistically significant at 95%?
e)To test the hypothesis that the probability of smoking does not depend on the level of education, what do you have to do? You just need to explain and do not need to actually do the test.
f)What would you do to test whether there is a nonlinear relationship between age and the probability of smoking? You just need to explain and do not need to actually do the test.