esda
esda copied to clipboard
Potential wrong row-standardization in lee.py
I could be wrong about this but it's confusing me for some time.
In spatial pearson statistic functions lee.py, line 83 and line 198, row-standardization is writen as:
standard_connectivity = self.connectivity / self.connectivity.sum(axis=1)
However, becaused the shape of self.connectivity.sum(axis=1)
is (n,), when broadcasting it is row-wise division instead of column-wise.
Just by checking standard_connectivity.sum(axis=1)
you will find it is not equal to 1. And this is problematic to the local spatial pearson statistic.
Possible editting could be standard_connectivity = self.connectivity / self.connectivity.sum(axis=1)[:, numpy.newaxis]
?
Looking forward to your discussion!
Yes, thanks! That does look like an issuse. I think we need to use self.connectivity.sum(axis=1, keepdims=True)
to fix. Thanks! Will get this done ASAP.
Thanks! @ljwolf
xref #91
Actually, this is correct as written thanks to the vagaries of the matrix API:
>>> from scipy import sparse
>>> import numpy
>>> numpy.random.seed(111211)
>>> connectivity = sparse.random(10,10,density=.4)
>>> connectivity.sum()
matrix([[2.50042315],
[3.55327513],
[0.79089601],
[1.44499382],
[1.63995059],
[2.36368386],
[1.9093926 ],
[3.15104416],
[2.07759805],
[0.80082077]])
The documentation says that connectivity
must be of type scipy.sparse.matrix
, and is what comes from using w.sparse
.
If you pass a numpy
array describing connectivity that you've built yourself, this will be silently incorrect. But, we document the expected input type, and for that type, the code is correct.
I will add a fix that ensures this correction works for array types as well. Eventually, we need to move to scipy.sparse.csc_array()
, but that will occur later.