CHAPTER: Global Spatial Autocorrelation
A first draft of questions for this chapter are issued in #20
In response to questions,
Questions are solid, discussion is good, and everything should proceed :smile:
I think it's a fair comment. What'd be a good way to keep track of it? Maybe to include on the list of things for the revision stage of the chapter?
yeah, linked here, it should be addressed before closing this. I reformatted the Q so that it has a checkbox for its completion. Only when that's complete should we close this issue & move to stable!
Cool! I can't remember if the chapter needs revision too or it's good to go? Last thing I said on the previous repo was:
Just made the final changes to the chapter. I think this needs a light read by somebody to make sure it's good to go, but I don't anticipate any big changes or additions required. Assigned to @sjsrey as it dovetails with his drafting of Ch. 7 on Local autocorrelation.
I think the code in the notebook is based on an older version of pysal than is in the container:
nbconvert.preprocessors.execute.CellExecutionError: An error occurred while executing the following cell:
------------------
# Generate W from the GeoDataFrame
w = weights.Distance.KNN.from_dataframe(db, k=8)
# Row-standardization
w.transform = 'R'
------------------
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-9-db177a29e4e5> in <module>
1 # Generate W from the GeoDataFrame
----> 2 w = weights.Distance.KNN.from_dataframe(db, k=8)
3 # Row-standardization
4 w.transform = 'R'
AttributeError: module 'pysal.lib.weights' has no attribute 'Distance'
AttributeError: module 'pysal.lib.weights' has no attribute 'Distance'
make: *** [Makefile:23: latex] Error 1
(gdsbook) gds/foundry - [jupytext●] » python
Python 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 09:07:38)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pysal
>>> pysal.__version__
'2.0.0'
>>>
And for the container (refreshed this morning) I have:
docker run -it -p 8888:8888 -v /home/serge/Dropbox/g/gds/foundry:/home/jovyan/host gdsbook /bin/bash
jovyan@2c51554c8b90:~$ python
Python 3.7.1 | packaged by conda-forge | (default, Feb 18 2019, 01:42:00)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pysal
/opt/conda/lib/python3.7/site-packages/pysal/model/spvcm/abstracts.py:10: UserWarning: The `dill` module is required to use the sqlite backend fully.
from .sqlite import head_to_sql, start_sql
>>> pysal.__version__
'2.0.0'
>>>
yes, that was part of the things that weren't quite right. I fixed this and a bunch of other general bits (e.g. data paths) on #51. I'd suggest merge that over first.
In this cell, Mathias Schlaeffer writes:
I was going through the geographicdata science book you co-authored and I have to first say a big thank you for co-authoring this book. It is a great resource and as concise as it can be, well done and thanks again.
But I also come with a question regarding Moran's I which confused my slightly. It is described as the slope of the line fit in the Moran's scatterplot. But I suspect a small mistake went into the part creating the scatterplot. You standardize the variables Pct_Leave and Pct_Leave_Lag separately each against their respective STDDEV and AVG, where I believe the variable Pct_Leave_Lag should also be standardized against the Avg(Pct_Leave) and Stddev(Pct_Leave).
I tested the hypothesis by recreating the variables for the scatterplot, and then deriving Moran's I from the actual slope of the line fit rather than the functionality provided in PySal. I get a slope of 0.777 which is higher than the Moran's I of 0.6454 which is cited later on. But if I use instead the described avg and stddev of the Pct_leave I get to the value of approx 0.64 as the slope.
I think that's right; we need to standardize using the mean/sd of y, not Wy?
This is ready for a second pass.
For Further Reading:
- https://escholarship.org/uc/item/3ph5k0d4 (Anselin what is special)
- http://gistbok.ucgis.org/bok-topics/global-measures-spatial-association
- Getis, A. (2007). Reflections on spatial autocorrelation. Regional Science & Urban Economics, 37: 491-496. DOI: 10.1016/j.regsciurbeco.2007.04.005 (link is external)
In this cell, Mathias Schlaeffer writes:
I was going through the geographicdata science book you co-authored and I have to first say a big thank you for co-authoring this book. It is a great resource and as concise as it can be, well done and thanks again.
But I also come with a question regarding Moran's I which confused my slightly. It is described as the slope of the line fit in the Moran's scatterplot. But I suspect a small mistake went into the part creating the scatterplot. You standardize the variables Pct_Leave and Pct_Leave_Lag separately each against their respective STDDEV and AVG, where I believe the variable Pct_Leave_Lag should also be standardized against the Avg(Pct_Leave) and Stddev(Pct_Leave).
I tested the hypothesis by recreating the variables for the scatterplot, and then deriving Moran's I from the actual slope of the line fit rather than the functionality provided in PySal. I get a slope of 0.777 which is higher than the Moran's I of 0.6454 which is cited later on. But if I use instead the described avg and stddev of the Pct_leave I get to the value of approx 0.64 as the slope.
I think that's right; we need to standardize using the mean/sd of y, not Wy?

So its deviations from the mean, not also divided by the sd.
After #225 I think it is good to go.