package `xarray` and `xarray-core` in conda-forge
What is your issue?
The current set of Xarray dependencies is very minimal. https://github.com/pydata/xarray/blob/3fd162e42bb309cfab03c2c18b037d1ad3cd3193/pyproject.toml#L25-L29
This is pretty unfriendly to a new user, and not a great out-of-the-box experience. You can't read any files (except npz, csv, parquet I guess), you can't access any tutorial datasets, you can't make plots, and you're missing a bunch of effectively free performance optimizations.
I think the current set of minimal dependencies is more appropriate to an xarray-core package.
Here are our optional dependencies for example:
https://github.com/pydata/xarray/blob/3fd162e42bb309cfab03c2c18b037d1ad3cd3193/pyproject.toml#L31-L48
Proposal
I suggest that we migrate to xarray-core and xarray packages in conda-forge.:
-
xarray-corewill have the current set of minimal dependencies. - For
xarrayI propose the following dependencies:-
flox,opt_einsum,numbaggfor accelerated computations -
fsspec,netcdf,zarrfor reading common datasets & "cloud" -
matplotlibfor plotting. -
poochto read tutorial datasets
-
Related: dask packages dask-core and I think matplotlib packages matplotlib-base
Note that there are many user survey comments asking for performance improvements
- "whatever can speed up computations would be welcomed"
- Optimizations, especially for "resample" and "rolling".
- faster computation
- Faster/smaller dask graph parallelizations?
- faster, less overhead
- "also, can we fix the fact that groupby() on a dimension with only one chunk returns something with a chunk size of one on that dimension? It produces huge graph sizes."
And then this counter-example 🤷🏾♂️ : "lightweight version without heavy dependencies"
(Yes, I thought I asked something similar a while ago around pooch but can't find it)
Ideally we would allow {name="xarray", default-features=false} for the minority of users that want the slim version. But IIUC python doesn't have any notion of "default but not required dependencies".
- So +1 on
xarray-corevsxarrayin that case - Another option would be encouraging
xarray[standard], but that doesn't seem like a common thing in python either
Looking at the 👍🏾 on the first post we seem to have general agreement.
How shall we proceed here? We could:
- Begin with an announcement
- Start distributing
xarray-core - Change the recipe for
xarrayin two months?
@shoyer it'd be good to get your vote here.
I am fine changing the package on conda-forge, this seems more in line with user expectations. Not sure we need opt-einsum, but otherwise the core list of recommended dependencies looks good.
On Wed, Jul 10, 2024 at 9:53 PM Deepak Cherian @.***> wrote:
Looking at the 👍🏾 on the first post we seem to have general agreement.
How shall we proceed here? We could:
- Begin with an announcement
- Start distributing xarray-core
- Change the recipe for xarray in two months?
@shoyer https://github.com/shoyer it'd be good to get your vote here.
— Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/9149#issuecomment-2222027151, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJJFVQPUQ5AAICW5GZ2SM3ZLYFTZAVCNFSM6AAAAABJU2RXDGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRSGAZDOMJVGE . You are receiving this because you were mentioned.Message ID: @.***>
@pydata/xarray can someone volunteer to take this on please?