Tuesday 22 May 2018

Concerning arising from the r squared value in the multiple regression report being too low


Some Independent students express concerns arising from the r squared value in the multiple regression report being too low.  One can see the figure in the "Regression Statistics" box in the top part of the Excel multiple regression report; see an example as follows:


SUMMARY OUTPUT
Regression Statistics
Multiple R
0.231187651
R Square
0.05344773
Adjusted R Square
0.029783923
Standard Error
1.08223436
Observations
124


In the report shown above, the R Square figure of 0.05344773 is very low; thus the statistics generated are not reliable.



The meaning of R-squared is explained further here.

What Is R-squared? (source link)
R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression.
The definition of R-squared is fairly straight-forward; it is the percentage of the response variable variation that is explained by a linear model. Or:
R-squared = Explained variation / Total variation
R-squared is always between 0 and 100%:
  • 0% indicates that the model explains none of the variability of the response data around its mean.
  • 100% indicates that the model explains all the variability of the response data around its mean.
In general, the higher the R-squared, the better the model fits your data. 


If students are concerned about the quality of the Excel report statistics due to the very low value of the R-squared figure, they could try to revise their multiple regression formula (e.g. changing the choice of x-variables) as an exploratory exercise to see if the resultant r-squared figure will get higher.  Whatever revision has been made to the multiple regression formula, students still need to justify their formula design with literature review findings.

Given that the Independent Study assignment is a mild research exercise with quite some policy and resource constraints, I could imagine that the resultant r-squared figure will often be quite low. [Note: very often, the multiple regression formula made by students are relatively simple, covering only about 3-4 variables of quite similar nature; that means, the formula used is very likely not comprehensive nor sophisticated. Because of it, the resultant r-squared figure is likely to be very low also. Sometimes, the combination of x variables used is not good, e.g. x variables as ROA and ROE - they are too similar in this case.] If students intend to stick to their regression formula already made, they could, when doing data analysis, raise the concern of the low value of the r-squared figure in their IS reports.


No comments:

Post a Comment