.

a) Design a simple econometric research project containing a full range of techniques

b) Evaluate basic techniques of econometrics in relation to specified problems

c) Discuss the role and limitations of econometric methods in the analysis of contemporary problems.

• Question 1 (30 marks)

a) The following chartshows a scatter plot of two variables. What is the goodness of fit (R-squared) of the

corresponding regression line? Please briefly explain your answer (5 marks).

b) Using the same data as above (Question 1a)), a researcher runs a regression model. The researcher is quite

surprised to find that the output does not show a figure for the p-value as it usually would. Please briefly explain why

the model cannot provide a p-value in this specific case (5 marks).

c) The following chart shows the distribution of a variable. Briefly explain why it will be problematic to use the

variable as a dependent variable in a standard OLS regression (5 marks).

d) Assume that you run a regression with 223 observations. The dependent variable is ‘annual salary’ and there are 3

independent variables ‘work experience in years’, ‘education duration in years’ and ‘number of employees in company’. The

regression yields following result for the variable ‘number of employees in company’:

Coefficient estimate: 150.3 ; standard error: 98.4

Calculate the p-value (two-tailed) and briefly discuss whether employees in larger companies earn significantly higher

salaries (5 marks).

e) A researcher wants to find out whether age has an effect on how happy people are. The researcher runs a regression

with the dependent variable ‘happiness score’ (0 to 10 with 10 being extremely satisfied) and the independent variable

‘age’ (in years). The modelling results show that age is not significant. You also have a look at the residual plot (shown

below). Please explain why the residual plot indicates that the regression generated by the researcher is misleading.

Discuss what relationship you expect between age and happiness. Outline how you could work this into the initial

regression model and hence, improve it (10 marks).

f) You want to know whether people with higher incomes are happier. Your friend has run a survey in their company and

run a regression on the data. The dependent variable is ‘happiness score’ (0 to 10 with 10 being extremely satisfied).

There is only one independent variable: ‘monthly income’ (in £). Your friend sends you the gretl output of the regression

via email. Unfortunately, the file got corrupted and only the critical F-value is legible (see below). Using this output,

show that ‘monthly income’ is indeed highly significant (provide p-value and explain calculation). Can you tell whether

workers with higher incomes are significantly happier? (10 marks)

Model 1: OLS, using observations □□□□

Dependent variable: happiness_score

Coefficient Std. Error t-ratio p-value

Const □□□□ □□□□ □□□□ □□□□□□□□

Monthly_income □□□□ □□□□ □□□□ □□□□□□□□

Mean dependent var □□□□ S.D. dependent var □□□□

Sum squared resid □□□□ S.E. of regression □□□□

R-squared □□□□ Adjusted R-squared □□□□

F(1, 198) 13.44598 P-value(F) □□□□

Log-likelihood □□□□ Akaike criterion □□□□

Schwarz criterion □□□□ Hannan-Quinn □□□□

• Question 2 (40 marks)

Using sample data for height (in inches) and weight (in pounds/lbs) of major baseball league players in the United States,

a researcher has generated following model:

Model 1: OLS, using observations 1-83

Dependent variable: weight_pounds

Coefficient Std. Error t-ratio p-value

const −158.102 58.8343 -2.6872 0.00874 ***

height_inches 4.84271 0.800029 6.0532 <0.00001 ***

Mean dependent var 197.8072 S.D. dependent var 22.77218

Sum squared resid 29278.58 S.E. of regression 19.01221

R-squared 0.311463 Adjusted R-squared 0.302963

F(1, 81) 36.64081 P-value(F) 4.22e-08

Log-likelihood −361.2014 Akaike criterion 726.4028

Schwarz criterion 731.2405 Hannan-Quinn 728.3463

a) Interpret the modelling results with specific focus on goodness of fit, the coefficient estimates and significance

(10 marks).

b) Please write the model results in equation form and calculate the predicted weight of a player who is 73 inches

tall (5 marks).

The researcher generates a second model, now including data for age in years. The modelling results are shown below.

Model 2: OLS, using observations 1-83

Dependent variable: weight_pounds

Coefficient Std. Error t-ratio p-value

const −211.373 62.1572 -3.4006 0.00105 ***

height_inches 5.11238 0.790125 6.4703 <0.00001 ***

age 1.17307 0.523714 2.2399 0.02787 **

Mean dependent var 197.8072 S.D. dependent var 22.77218

Sum squared resid 27550.74 S.E. of regression 18.55759

R-squared 0.352097 Adjusted R-squared 0.335899

F(2, 80) 21.73759 P-value(F) 2.89e-08

Log-likelihood −358.6771 Akaike criterion 723.3542

Schwarz criterion 730.6107 Hannan-Quinn 726.2694

c) Has the inclusion of age improved the initial model? Briefly explain your answer (5 marks)

d) Please write the revised model in equation form and predict the weight of a baseball player who is aged 27 years

and is 70 inches tall. How accurate is this prediction? (10 marks)

e) According to the second model, how much does the weight of a baseball player change within 10 years? Why would a

time series model be better to estimate this? (5 marks)

f) Outline how the second model could be further improved (5 marks).

Learning outcomes assessed: b, c

• Question 3 (20 marks)

Considering data on fuel consumption (G) and price of fuel per litre (Pg) for 36 years, per capita disposable income (Y),

a price index for new cars (Pnc), and a price index for public transportation (Ppt), a researcher has estimatedthe

following model.

.