book icon indicating copy to clipboard operation
book copied to clipboard

Global and local spatial autocorrelation chapters - standardization?

Open iamwfx opened this issue 2 years ago • 3 comments

Hi, in both chapters you mention standardizing the Pct_leave and the w_Pct_leave. In the global spatial autocorrelation chapter, the standardization is described as only subtracting the mean (which is what the code reflects), while the local spatial autocorr chapter describes standardization as subtracting the mean and dividing by std. dev., though the code only shows subtracting means.

iamwfx avatar Mar 04 '23 19:03 iamwfx

Also, typo here in the original text: db["w_Pct_Leave_std"] = db["w_Pct_Leave"] - db["Pct_Leave"].mean() should be db["w_Pct_Leave"].mean()

iamwfx avatar Mar 04 '23 19:03 iamwfx

The top one we will definitely fix, thanks!

The bottom, though, is indeed correct but unclear in the "Local" chapter, and technically wrong in the global chapter. We've tried to edit this before & clearly failed. I'll change both to be consistent and compute the "spatial lag of centered % leave," as this is what we use and also intend to discuss.

To explain (for @darribas and @sjsrey in future edits...), the original statistic is stated only in terms of z and w. There's no mean(Wz) in the statistic, just in the scatterplot. In the original LISA paper, the mean of W.TOTCON and TOTCON are dashed lines. But, the dashed line for W.TOTCON is above zero on the y-axis:

Screenshot 2023-03-06 at 14 05 19

So, we definitely plot centered x vs. W(centered x)... but what about the "axis" we're plotting onto it?

Well... this is where we may differ from GeoDa (and from Anselin (1995)) is that we also classify based on the original mean of X. spdep also does this afaict. I think this is the correct approach, too: in this scheme, "high-high" classifications reflect observations with higher-than-average x and also higher-than-average x nearby. Classifying with respect to mean(Wx) shifts this latter part to "higher-than-average spatial lag" which is less intuitive...

None of it affects the actual slope of the line in the plot, just the intercept. Our version will give the intercept of the line as the average of the spatial lag when x is at its mean, while the other version would force a regression through the origin.

ljwolf avatar Mar 06 '23 14:03 ljwolf

Paging this thread which also discussed the matter:

https://github.com/gdsbook/book/issues/32#issuecomment-766918790

darribas avatar Mar 07 '23 10:03 darribas