xarray icon indicating copy to clipboard operation
xarray copied to clipboard

package `xarray` and `xarray-core` in conda-forge

Open dcherian opened this issue 1 year ago • 2 comments

What is your issue?

The current set of Xarray dependencies is very minimal. https://github.com/pydata/xarray/blob/3fd162e42bb309cfab03c2c18b037d1ad3cd3193/pyproject.toml#L25-L29

This is pretty unfriendly to a new user, and not a great out-of-the-box experience. You can't read any files (except npz, csv, parquet I guess), you can't access any tutorial datasets, you can't make plots, and you're missing a bunch of effectively free performance optimizations.

I think the current set of minimal dependencies is more appropriate to an xarray-core package. Here are our optional dependencies for example: https://github.com/pydata/xarray/blob/3fd162e42bb309cfab03c2c18b037d1ad3cd3193/pyproject.toml#L31-L48

Proposal

I suggest that we migrate to xarray-core and xarray packages in conda-forge.:

  1. xarray-core will have the current set of minimal dependencies.
  2. For xarray I propose the following dependencies:
    1. flox, opt_einsum, numbagg for accelerated computations
    2. fsspec, netcdf, zarr for reading common datasets & "cloud"
    3. matplotlib for plotting.
    4. pooch to read tutorial datasets

Related: dask packages dask-core and I think matplotlib packages matplotlib-base

dcherian avatar Jun 20 '24 23:06 dcherian

Note that there are many user survey comments asking for performance improvements

  • "whatever can speed up computations would be welcomed"
  • Optimizations, especially for "resample" and "rolling".
  • faster computation
  • Faster/smaller dask graph parallelizations?
  • faster, less overhead
  • "also, can we fix the fact that groupby() on a dimension with only one chunk returns something with a chunk size of one on that dimension? It produces huge graph sizes."

And then this counter-example 🤷🏾‍♂️ : "lightweight version without heavy dependencies"

dcherian avatar Jun 20 '24 23:06 dcherian

(Yes, I thought I asked something similar a while ago around pooch but can't find it)

Ideally we would allow {name="xarray", default-features=false} for the minority of users that want the slim version. But IIUC python doesn't have any notion of "default but not required dependencies".

  • So +1 on xarray-core vs xarray in that case
  • Another option would be encouraging xarray[standard], but that doesn't seem like a common thing in python either

max-sixty avatar Jun 20 '24 23:06 max-sixty

Looking at the 👍🏾 on the first post we seem to have general agreement.

How shall we proceed here? We could:

  1. Begin with an announcement
  2. Start distributing xarray-core
  3. Change the recipe for xarray in two months?

@shoyer it'd be good to get your vote here.

dcherian avatar Jul 11 '24 04:07 dcherian

I am fine changing the package on conda-forge, this seems more in line with user expectations. Not sure we need opt-einsum, but otherwise the core list of recommended dependencies looks good.

On Wed, Jul 10, 2024 at 9:53 PM Deepak Cherian @.***> wrote:

Looking at the 👍🏾 on the first post we seem to have general agreement.

How shall we proceed here? We could:

  1. Begin with an announcement
  2. Start distributing xarray-core
  3. Change the recipe for xarray in two months?

@shoyer https://github.com/shoyer it'd be good to get your vote here.

— Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/9149#issuecomment-2222027151, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJJFVQPUQ5AAICW5GZ2SM3ZLYFTZAVCNFSM6AAAAABJU2RXDGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRSGAZDOMJVGE . You are receiving this because you were mentioned.Message ID: @.***>

shoyer avatar Jul 11 '24 06:07 shoyer

@pydata/xarray can someone volunteer to take this on please?

dcherian avatar Jul 31 '24 17:07 dcherian