xarray
xarray copied to clipboard
`.fillna()` slower than expected for sparse data arrays with `fill_value=nan`
What is your issue?
Expected behavior
.fillna(0)
should be near instantaneous when applied to a sparse DataArray with fill_value=nan
Why
.fillna(0)
only needs to update the fill_value
to 0
.
Current behaviour
The normal .where()
operation is applied on the DataArray instead of using the shortcut described above.
Question
What would be required to improve the performance of fillna()
? I'm happy to try taking a stab at it if pointed in the right direction.
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!
Thanks @staadecker for raising this and sorry for the delay. If this is still an issue please reopen with an MCVE.
.fillna(0) should be near instantaneous when applied to a sparse DataArray with fill_value=nan
I was going to suggest this be implemented upstream in https://github.com/pydata/sparse/ but I don't think that's actually possible since fillna
is where(notnull(data), data, other)
So, one option is to special case sparse
here: https://github.com/pydata/xarray/blob/f0ee037fae05cf4b69b26bae632f9297e81272ca/xarray/core/duck_array_ops.py#L361-L365
@staadecker is there a reason you can't do this manually, and that you want to fillna
to do it?
Hi @dcherian! Thanks for the response. I feel that it would be best if .fillna()
was performant by default rather than requiring a workaround or users to write code manually. That being said, this issue is no longer relevant to me so I'll let the library maintainers decide what they'd like to do with it.