Friday 28 October 2022

The more independent variables in multiple regression formula, the better: a note

The more independent variables in multiple regression formula, the better: a note:

The more independent variables (Xs) do not make a multiple regression formula more "comprehensive" or "strong in explanation power". Some considerations are as follows:


1. If certain X variables are closely related (e.g. correlated), that could negatively affect the ability of the regression formula to work out the influence of these X variables. You need to ensure that all the X variables are mainly independent (e.g. with very low correlation with the other X variables).

2. There is a need to examine the organization structure of the X variables. It is likely that they do not make up a simple many-to-one relationship with the dependent variable (the Y variable).

3. More fundamentally, there is a problem of "do not know what we don't know". We do not know what other important variables have been omitted in the multiple regression formula.

4. Related to point 3, even if we are aware that a particular independent variable is is important for the construction of the multiple regression formula, there may not be data available on this  independent variable or that the variable can be quantified/ measured satisfactorily.


Beyond that, there are fundamental limitations of multiple regression analysis, especially using a relatively simple statistical tool of Excel. A notable example is that the multiple regression analysis is based on linear equations between the X and Y variables. It is quite possible that the correlation between variables is non-linear. Also, the data you gather for the multiple regression analysis can be biased, e.g. when your data sample is not probabilistic.


1 comment:

  1. Notice: https://josephho33.blogspot.com/2022/08/no-longer-using-facebook-ac-joseph-kk-ho.html

    ReplyDelete