esda Potential wrong row-standardization in lee.py

Potential wrong row-standardization in lee.py

Open ChaozhongLiu opened this issue 2 years ago • 2 comments

I could be wrong about this but it's confusing me for some time.

In spatial pearson statistic functions lee.py, line 83 and line 198, row-standardization is writen as: standard_connectivity = self.connectivity / self.connectivity.sum(axis=1)

However, becaused the shape of self.connectivity.sum(axis=1) is (n,), when broadcasting it is row-wise division instead of column-wise.

Just by checking standard_connectivity.sum(axis=1) you will find it is not equal to 1. And this is problematic to the local spatial pearson statistic.

Possible editting could be standard_connectivity = self.connectivity / self.connectivity.sum(axis=1)[:, numpy.newaxis]?

Looking forward to your discussion!

May 27 '22 17:05 ChaozhongLiu

Yes, thanks! That does look like an issuse. I think we need to use self.connectivity.sum(axis=1, keepdims=True) to fix. Thanks! Will get this done ASAP.

May 27 '22 18:05 ljwolf

Thanks! @ljwolf

May 27 '22 18:05 ChaozhongLiu

xref #91

Nov 25 '22 16:11 jGaboardi

Actually, this is correct as written thanks to the vagaries of the matrix API:

>>> from scipy import sparse
>>> import numpy
>>> numpy.random.seed(111211)
>>> connectivity = sparse.random(10,10,density=.4)
>>> connectivity.sum()
matrix([[2.50042315],
        [3.55327513],
        [0.79089601],
        [1.44499382],
        [1.63995059],
        [2.36368386],
        [1.9093926 ],
        [3.15104416],
        [2.07759805],
        [0.80082077]])

The documentation says that connectivity must be of type scipy.sparse.matrix, and is what comes from using w.sparse.

If you pass a numpy array describing connectivity that you've built yourself, this will be silently incorrect. But, we document the expected input type, and for that type, the code is correct.

I will add a fix that ensures this correction works for array types as well. Eventually, we need to move to scipy.sparse.csc_array(), but that will occur later.

May 24 '23 22:05 ljwolf

esda esda copied to clipboard

Potential wrong row-standardization in lee.py

esda
esda copied to clipboard