Sunday, 3 June 2018

The concern that multiple regression analysis results appear strange and unreliable

The concern that multiple regression analysis results appear strange and unreliable - How to handle the matter:


There are several ways to interpret the multiple regression analysis results when they appear unreliable (e.g. R squared figure is approaching zero) and the b-values of x variables differ from that of the academic literature:

Response 1: they are different from that of the academic literature because the time (of the study; e.g. your study is done in 2018; the studies reported in the literature were done many years ago; we need to recognize that phenomena (such as correlation patterns) change over time), place (of the study; e.g., your study is done in HK while the studies reported in the literature were done in USA and Europe; the social and economic conditions are different) and samples (i.e. company profiles; e.g., your study is on the service sector while the studies reported in the academic literature are on the manufacturing sector) used by you are different from that of the existing academic literature.

Response 2: The number of x variables considered in the multiple regression formula is very few; the multiple regression model is not sophisticated nor comprehensive to start with.

Response 3: Multiple regression analysis on correlation is based on the assumption of of a linear best-fit line for calculation. It is possible that the empirical data do have a much closer but non-linear correlation (e.g. curvilinear), see Diagram 1





Another potential problem of correlation analysis, including the multiple regression analysis is known as the Simpson's Paradox. Also see diagram 2 for further illustration of the idea:

Diagram 2



Lastly, I need to point out that findings of regression results with no strong correlation can still be useful findings.

No comments:

Post a Comment