libpysal
libpysal copied to clipboard
`WSP` builder from rasters returns integers only
The recommended WSP builder in the raster section of weights seems to return WSP objects that can only have int8 values. This makes if fail, for example, on computations that require products like LISA statistics.
From the raster demo:
from libpysal.weights import Queen, raster
import numpy as np
import xarray as xr
from esda import Moran_Local
ds = xr.tutorial.open_dataset("air_temperature.nc") # -> returns a xarray.Dataset object
da = ds["air"] # we'll use the "air" data variable for further analysis
da = da.groupby('time.month').mean()
coords_labels = {}
coords_labels["z_label"] = "month" # since month does not belong to the default list we need to pass it using a dictionary
w_queen = Queen.from_xarray(
da, z_value=12, coords_labels=coords_labels, sparse=False)
w_queen is a full W and seems to work as expected:
w_queen.transform = 'r'
np.unique(w_queen.sparse.data)
> array([0.125 , 0.2 , 0.33333333])
If we build it straight into WSP, which is much faster and efficient (and the current default):
w_queen = Queen.from_xarray(
da, z_value=12, coords_labels=coords_labels, sparse=True)
w_queen.transform = 'r'
np.unique(w_queen.sparse.data)
> array([1], dtype=int8)
Any ideas why it does not switch to floats to allow for the weights to adjust?
cc' @MgeeeeK as they might have some suggestions since they worked on this intensively.
Probably a bug, lemme check!
Hmm... da2wsp() doesn't deal with standardization, so that option is ignored when sparse=True since it is propagated down to the da2W() and da2WSP() functions using splatting (**kwargs).
Should be a one-line fix either in da2WSP or directly in the .from_xarray() method... probably best in da2WSP(). In theory.... we should have a centralized "standardizer" mixin class 😄
Probably a bug, lemme check!
whoops, should've left this for you @MgeeeeK, sorry :/ just saw in morning email & recalled the standardization options...
int8 is hardcoded here:
https://github.com/pysal/libpysal/blob/0f03a313896de6af096ff92105e54da86bb78368/libpysal/weights/raster.py#L635
I think that makes sense as a default as it's more memory efficient, but I thought that'd convert directly into other types (eg. float64) if multiplied. Perhaps that is not occurring?
I think the ideal behaviour is that by default the binary weights are made of int8, but if we transform the weights (e.g. row-standardise or spatial lag even) it'd accept the operations.
whoops, should've left this for you MgeeeeK, sorry :/ just saw in morning email & recalled the standardization options...
https://github.com/pysal/libpysal/blob/0f03a313896de6af096ff92105e54da86bb78368/libpysal/weights/raster.py#L301
no problem! I think you are correct, sparse weights does not support transformations rn. (afaiu)
I think the ideal behaviour is that by default the binary weights are made of int8, but if we transform the weights (e.g. row-standardise or spatial lag even) it'd accept the operations.
that makes sense, will look into it
A similar issue affects lat2SW
https://github.com/pysal/libpysal/blob/master/libpysal/weights/util.py#L1248
Would it make sense to have the dtype optionally chosen by the user with a default to int8 and/or an 'auto' option that picks up the dtype of the DataArray? This might not be very efficient at scale but, as it turns out, it does make computations down the line easier (e.g. numba in the LISA complains if the W is expressed in float32 and the y is expressed as float64). Should we maybe even default to float64 and if you know what you're doing you can set it to int8 if you want things to run lean?
I've given it a bit more thought and fleshed out some ideas in this notebook. There's a quick sketch of how we could go about aligning the dtypes of y and w when running things like Moran_Local. It relies on a function aligner that takes the following arguments:
'''
y
w
how : str/dtype
[Optional. Default='y'] Alignment policy:
- 'y': use `y.dtype`
- 'w': use `dtype` in `w`
- `dtype`: different `dtype` target
'''
I'll copy here my current thinking (from the bottom of the notebook):
- Something like
alignercould be embedded inMoran_Localand all methods that rely onnumba, but this would need to be added after any transformations and on each of the classes that currently rely on accelerated randomisation - My general sense is to have a sensible default (either
y.dtypeorfloat64) and leave flexibility as an option if folks need it - Something to keep in mind that I've not thought too much about beyond seeing it might be an issue is whether there are more memory efficient ways of doing all this. If
wis a large matrix, the approach taken in_convert_w_dtypemight not be feasible/desirable. Do we have other ways? - At any rate, if
alignerwas called within something likeMoran_Local, warnings would need to be raised so the user is aware. In fact, one option inMoran_localcould select what to do, with the option of raising an error if they're different, or convertingdtypes automatically.
I think there's two aspects to this issue, one inmediate that we could fix along the lines suggested, and a longer term one that I think should be addressed in how W objects are stored and manipulated, perhaps in a new iteration of the current W/WSP objects.