xarray icon indicating copy to clipboard operation
xarray copied to clipboard

`.fillna()` slower than expected for sparse data arrays with `fill_value=nan`

Open staadecker opened this issue 11 months ago • 4 comments

What is your issue?

Expected behavior

.fillna(0) should be near instantaneous when applied to a sparse DataArray with fill_value=nan

Why

.fillna(0) only needs to update the fill_value to 0.

Current behaviour

The normal .where() operation is applied on the DataArray instead of using the shortcut described above.

Question

What would be required to improve the performance of fillna()? I'm happy to try taking a stab at it if pointed in the right direction.

staadecker avatar Feb 28 '24 16:02 staadecker

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!

welcome[bot] avatar Feb 28 '24 16:02 welcome[bot]

Thanks @staadecker for raising this and sorry for the delay. If this is still an issue please reopen with an MCVE.

kmuehlbauer avatar Jun 11 '24 13:06 kmuehlbauer

.fillna(0) should be near instantaneous when applied to a sparse DataArray with fill_value=nan

I was going to suggest this be implemented upstream in https://github.com/pydata/sparse/ but I don't think that's actually possible since fillna is where(notnull(data), data, other)

So, one option is to special case sparse here: https://github.com/pydata/xarray/blob/f0ee037fae05cf4b69b26bae632f9297e81272ca/xarray/core/duck_array_ops.py#L361-L365

@staadecker is there a reason you can't do this manually, and that you want to fillna to do it?

dcherian avatar Jun 11 '24 15:06 dcherian

Hi @dcherian! Thanks for the response. I feel that it would be best if .fillna() was performant by default rather than requiring a workaround or users to write code manually. That being said, this issue is no longer relevant to me so I'll let the library maintainers decide what they'd like to do with it.

staadecker avatar Jun 11 '24 22:06 staadecker