book icon indicating copy to clipboard operation
book copied to clipboard

CHAPTER: Global Spatial Autocorrelation

Open darribas opened this issue 6 years ago • 12 comments

Thread to discuss chapter on global autocorrelation. Picks up from GDS#4

darribas avatar Jun 30 '19 11:06 darribas

A first draft of questions for this chapter are issued in #20

darribas avatar Jun 30 '19 11:06 darribas

In response to questions,

Questions are solid, discussion is good, and everything should proceed :smile:

ljwolf avatar Jul 16 '19 19:07 ljwolf

I think it's a fair comment. What'd be a good way to keep track of it? Maybe to include on the list of things for the revision stage of the chapter?

darribas avatar Jul 16 '19 19:07 darribas

yeah, linked here, it should be addressed before closing this. I reformatted the Q so that it has a checkbox for its completion. Only when that's complete should we close this issue & move to stable!

ljwolf avatar Jul 16 '19 19:07 ljwolf

Cool! I can't remember if the chapter needs revision too or it's good to go? Last thing I said on the previous repo was:

Just made the final changes to the chapter. I think this needs a light read by somebody to make sure it's good to go, but I don't anticipate any big changes or additions required. Assigned to @sjsrey as it dovetails with his drafting of Ch. 7 on Local autocorrelation.

darribas avatar Jul 16 '19 19:07 darribas

I think the code in the notebook is based on an older version of pysal than is in the container:

nbconvert.preprocessors.execute.CellExecutionError: An error occurred while executing the following cell:
------------------
# Generate W from the GeoDataFrame
w = weights.Distance.KNN.from_dataframe(db, k=8)
# Row-standardization
w.transform = 'R'
------------------

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-9-db177a29e4e5> in <module>
      1 # Generate W from the GeoDataFrame
----> 2 w = weights.Distance.KNN.from_dataframe(db, k=8)
      3 # Row-standardization
      4 w.transform = 'R'

AttributeError: module 'pysal.lib.weights' has no attribute 'Distance'
AttributeError: module 'pysal.lib.weights' has no attribute 'Distance'

make: *** [Makefile:23: latex] Error 1

(gdsbook) gds/foundry - [jupytext●] » python
Python 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 09:07:38) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pysal
>>> pysal.__version__
'2.0.0'
>>> 

And for the container (refreshed this morning) I have:

docker run -it -p 8888:8888 -v /home/serge/Dropbox/g/gds/foundry:/home/jovyan/host gdsbook /bin/bash
jovyan@2c51554c8b90:~$ python
Python 3.7.1 | packaged by conda-forge | (default, Feb 18 2019, 01:42:00) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pysal
/opt/conda/lib/python3.7/site-packages/pysal/model/spvcm/abstracts.py:10: UserWarning: The `dill` module is required to use the sqlite backend fully.
  from .sqlite import head_to_sql, start_sql
>>> pysal.__version__
'2.0.0'
>>> 

sjsrey avatar Jul 31 '19 16:07 sjsrey

yes, that was part of the things that weren't quite right. I fixed this and a bunch of other general bits (e.g. data paths) on #51. I'd suggest merge that over first.

darribas avatar Aug 01 '19 08:08 darribas

In this cell, Mathias Schlaeffer writes:

I was going through the geographicdata science book you co-authored and I have to first say a big thank you for co-authoring this book. It is a great resource and as concise as it can be, well done and thanks again.

But I also come with a question regarding Moran's I which confused my slightly. It is described as the slope of the line fit in the Moran's scatterplot. But I suspect a small mistake went into the part creating the scatterplot. You standardize the variables Pct_Leave and Pct_Leave_Lag separately each against their respective STDDEV and AVG, where I believe the variable Pct_Leave_Lag should also be standardized against the Avg(Pct_Leave) and Stddev(Pct_Leave).

I tested the hypothesis by recreating the variables for the scatterplot, and then deriving Moran's I from the actual slope of the line fit rather than the functionality provided in PySal. I get a slope of 0.777 which is higher than the Moran's I of 0.6454 which is cited later on. But if I use instead the described avg and stddev of the Pct_leave I get to the value of approx 0.64 as the slope.

I think that's right; we need to standardize using the mean/sd of y, not Wy?

ljwolf avatar Jan 25 '21 16:01 ljwolf

This is ready for a second pass.

ljwolf avatar Jul 19 '21 16:07 ljwolf

For Further Reading:

  • https://escholarship.org/uc/item/3ph5k0d4 (Anselin what is special)
  • http://gistbok.ucgis.org/bok-topics/global-measures-spatial-association
  • Getis, A. (2007). Reflections on spatial autocorrelation. Regional Science & Urban Economics, 37: 491-496. DOI: 10.1016/j.regsciurbeco.2007.04.005 (link is external)

sjsrey avatar Sep 24 '21 14:09 sjsrey

In this cell, Mathias Schlaeffer writes:

I was going through the geographicdata science book you co-authored and I have to first say a big thank you for co-authoring this book. It is a great resource and as concise as it can be, well done and thanks again.

But I also come with a question regarding Moran's I which confused my slightly. It is described as the slope of the line fit in the Moran's scatterplot. But I suspect a small mistake went into the part creating the scatterplot. You standardize the variables Pct_Leave and Pct_Leave_Lag separately each against their respective STDDEV and AVG, where I believe the variable Pct_Leave_Lag should also be standardized against the Avg(Pct_Leave) and Stddev(Pct_Leave).

I tested the hypothesis by recreating the variables for the scatterplot, and then deriving Moran's I from the actual slope of the line fit rather than the functionality provided in PySal. I get a slope of 0.777 which is higher than the Moran's I of 0.6454 which is cited later on. But if I use instead the described avg and stddev of the Pct_leave I get to the value of approx 0.64 as the slope.

I think that's right; we need to standardize using the mean/sd of y, not Wy?

Screenshot from 2022-01-06 09-10-51

So its deviations from the mean, not also divided by the sd.

sjsrey avatar Jan 06 '22 17:01 sjsrey

After #225 I think it is good to go.

sjsrey avatar Jan 06 '22 17:01 sjsrey