esda
esda copied to clipboard
Potential wrong row-standardization in lee.py
I could be wrong about this but it's confusing me for some time.
In spatial pearson statistic functions lee.py, line 83 and line 198, row-standardization is writen as:
standard_connectivity = self.connectivity / self.connectivity.sum(axis=1)
However, becaused the shape of self.connectivity.sum(axis=1) is (n,), when broadcasting it is row-wise division instead of column-wise.
Just by checking standard_connectivity.sum(axis=1) you will find it is not equal to 1. And this is problematic to the local spatial pearson statistic.
Possible editting could be standard_connectivity = self.connectivity / self.connectivity.sum(axis=1)[:, numpy.newaxis]?
Looking forward to your discussion!
Yes, thanks! That does look like an issuse. I think we need to use self.connectivity.sum(axis=1, keepdims=True) to fix. Thanks! Will get this done ASAP.
Thanks! @ljwolf
xref #91
Actually, this is correct as written thanks to the vagaries of the matrix API:
>>> from scipy import sparse
>>> import numpy
>>> numpy.random.seed(111211)
>>> connectivity = sparse.random(10,10,density=.4)
>>> connectivity.sum()
matrix([[2.50042315],
[3.55327513],
[0.79089601],
[1.44499382],
[1.63995059],
[2.36368386],
[1.9093926 ],
[3.15104416],
[2.07759805],
[0.80082077]])
The documentation says that connectivity must be of type scipy.sparse.matrix, and is what comes from using w.sparse.
If you pass a numpy array describing connectivity that you've built yourself, this will be silently incorrect. But, we document the expected input type, and for that type, the code is correct.
I will add a fix that ensures this correction works for array types as well. Eventually, we need to move to scipy.sparse.csc_array(), but that will occur later.