xarray
xarray copied to clipboard
Clarify difference between `.load()` and `.compute()`
What is your issue?
I just realized that the difference between .load()
and .compute()
is that .load()
operates inplace and .compute()
returns a new xarray object.I have 2 suggestions for how this could be clearer:
- Docs: the API docs for each method could reference the other.
- Code: this might be too big a change, but maybe
.load()
should not return anything. Consider this example from pandas:
this matches the behavior of inplace actions in Python itself likeimport pandas as pd df = pd.DataFrame({"air": []}) df.rename({"air": "foo"}, axis=1, inplace=True) # returns None since df is renamed inplace
list.append
ordict.update
. This would be a major breaking change though, and it might be easier to just remove.load()
entirely.
Yeah, deprecating return values is not fun. We somewhat recently did that for Dataset.update
and found that we could only mention this in the documentation / release notes, which is not great for visibility.
Interestingly, the docstring of Dataset.compute
and Dataset.load
compare with each other, while for the methods on DataArray
and Variable
that comparison is missing. None of them actually link to the other method, though.
Given that we have been removing inplace operations as much as possible, I'd vote to remove load
entirely or make it an alias of compute
.
Edit: up until I read this issue, I somehow assumed compute
would only work with dask
while load
would also load our lazy array implementation into memory. Not sure how I got that impression, but maybe that's another argument to remove / align load
?
Thanks for engaging with this! I think removing load
probably makes the most sense. Making load
an alias for compute
would just continue the confusion since there wouldn't be a clear right way to do things.
I somehow assumed
compute
would only work withdask
whileload
would also load our lazy array implementation into memory. Not sure how I got that impression, but maybe that's another argument to remove / alignload
?
I'm not quite sure I follow. Are there lazy array implementations other than dask in xarray?
Are there lazy array implementations other than dask in xarray?
No, sorry, I mistyped. What I was referring to is the lazy loading mechanism that will load the data into memory once you try to actually do something with it.
However, it might be possible to wrap other lazy array implementations if they implement either __array_ufunc__
and __array_function__
, or __array_namespace__
(see #6807).
👍🏾 for updating the docstring.
IME load
is commonly used so I don't know that it's worth removing.
I can imagine a scenario where load
is de-emphasized in the narrative docs, and compute
is used instead. It wouldn't have to be a formal deprecation, but it could be that compute
is best practice.
While there is overlap in the behavior here, I've always thought of these two methods as having distinct applications. .compute()
is a Dask collection method. .load()
, which predates Xarray's Dask integration, was originally meant to load data from our lazy loading backend arrays (e.g. a netCDF file).
Edit: up until I read this issue, I somehow assumed compute would only work with dask while load would also load our lazy array implementation into memory. Not sure how I got that impression, but maybe that's another argument to remove / align load?
This was my impression as well but now I understand that the primary difference is that load
is inplace while compute
is not.
I should say that there might very well be other uses of load
that I am not aware of. The inplace vs not is just the only difference that I noticed in the context of dask-backed xarray objects.
From this discussion I think for dask-backed objects we should recommend using compute
. Maybe that is enough of a decision.
note that the implementation of compute
is currently:
def compute(self, **kwargs):
new = self.copy(deep=False)
return new.load(**kwargs)
which means that the inplace vs. "pure function" is literally the only difference (though I don't know whether doing a shallow copy has other implications).