Saturday, 24 December 2011

On chi-square test - a brief note

Chi-square test is one of the popular statistical techniques used in dissertation projects. I use the example from Morris (Chapter 11, pp. 239-248) for illustration here:

In this example, you have got some statistics on work performance improvements after the  conduct of a corporate training, grouped  by age group. (Table 1)

Your question is: does the age  group affects the effectiveness of the corporate training? You draw up another table (Table 2)



As the overall ratio of "improved": ""Did not improve" is 40:20 or 2:1, we come up with the expected numbers in red in Table 2 above.  In brief, if the observed nos (in black) in Table 2 are very different from the expected nos (in red), we have good reasons to believe that "age group" does affect "training effectiveness" in this case. The chi-square value basically measures the deviation (in aggregate) between the observed nos and the expected nos. You need to refer to Morris (chapter 11) to study the case and learn how to calculate the chi-square value. There is an applet on chi-square curve which helps you to conduct a chi-square test-based hypothesis testing, see: http://stat-www.berkeley.edu/users/stark/Java/Html/chiHiLite.htm. To perform a chi-square test, you need to also calculate the degree of freedom for your statistics. The formula to calculate degree of freedom in this case is (r-1) x (c - 1). r means the number  of rows in your table and c refers to the number of columns. In our case above, r is 3 (under 35; 35-50; over 50) and c is 2 (improved; not improved).

The main steps for the hypothesis are:

  1. Formulate the null hypothesis (of "no  association" type)
  2. Calculate the expected nos.
  3. Calculate chi-square value (for the actual statistics)
  4. Calculate the degree of freedom (based on number of rows and tables in the table constructed)
  5. Compare chi square value of your statistics (from step 3) and chi square value (of critical value) (based on the level of significance set)
  6. If the actual chi square value of your statistics > than the critical value, reject the null hypothesis.

In our case, the actual chi square value of the statistics is 2.95 while the chi square value (critical value) is 5.991, with degree of freedom (v) at 2. Since 2.95 is less than 5.991 (2.95 is not too extreme as a value), the null hypothesis is not rejected in this case, given that the level of significance is set at 5%, also see the diagram here:





There is an excel function for chi-square value calculation which is =chitest. The following 2 diagrams are illustrative:






Pls note that, in our case, when the actual chi-square value is 2.95, its corresponding p-value is 0.228779; and when the chi-square value (critical value) is 5.991, its corresponding p-value is 0.05. Since 0.228779 > 0.05, the null hypothesis is not rejected. Another way to put it is: 2.95 < 5.991, the null hypothesis is not rejected.

This note is not to replace the textbook reading, but can be used to reinforce students' learning in class. To fully understand chi-square test, you need to study statistics textbooks on hypothesis testing, including the notion of p-value.


Reference
  • Morris, C. (2003) Quantitative Approaches in Business Studies, Prentice Hall.

No comments:

Post a Comment