practical-statistics-for-data-scientists icon indicating copy to clipboard operation
practical-statistics-for-data-scientists copied to clipboard

chi-square, resampling approach

Open frahimov opened this issue 11 months ago • 2 comments

Hi, I hope it is OK that I am commenting on this here. In chapter 3 I am stuck at this step: 3. Find the squared differences between the shuffled counts and expected counts then sum them. Do you mean "calculate chi-square statistics" for each resampled sample set, where you calculate Pearson residuals first, or you just literally sum the squared differences between observed and expected counts? Thank you.

frahimov avatar Mar 08 '24 20:03 frahimov

Hello, thank you for your feedback. This is a good place for general questions. This one you probably could have added to the errata page on the O'Reilly website.

We meant to calculate the chi-square statistic in step 3, and that is what is in the code. That said, you can also just use the sum of resample squared differences in step 3, provided that you then compare it to the observed sum of squared differences in step 5. The chi-square statistic was developed before the computer age, when it was convenient to have a standardized test statistic that could be compared to standard tables in textbooks. The chi-square statistic and the sum of squared differences are two different (but related) ways to measure the difference between observed and expectation.

gedeck avatar Mar 12 '24 19:03 gedeck

Hi. Thank you for your response. Actually, I was referring to your book and did not know that there was an errata page on the O'Reilly website. I will post there if I have more questions as I go through the chapters.

frahimov avatar Mar 15 '24 22:03 frahimov