xarray icon indicating copy to clipboard operation
xarray copied to clipboard

list available backends and basic descriptors

Open JessicaS11 opened this issue 1 year ago • 25 comments

Wanted to garner maintainer feedback on the basic implementation before I add any docs. I will need some guidance for adding tests.

  • [x] Closes #6577
  • [ ] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst

JessicaS11 avatar Sep 06 '22 21:09 JessicaS11

I know that it is more work but I think it would be more beneficial to have this information formated nicely than simply returning some lists. Maybe similar to a pandas dataframe?

headtr1ck avatar Sep 07 '22 05:09 headtr1ck

I know that it is more work but I think it would be more beneficial to have this information formated nicely than simply returning some lists.

Sounds good - I'll make that happen!

JessicaS11 avatar Sep 07 '22 20:09 JessicaS11

Hey @JessicaS11! Nice to see you here. This is a going to be a great first-time contribution!

I know that it is more work but I think it would be more beneficial to have this information formated nicely than simply returning some lists. Maybe similar to a pandas dataframe?

One approach might be to add a nice __str__, __repr__, (or even _repr_html_) to the BackendEntrypoint class. That way printing a list (or better a dict) of the engine-specific subclasses would automatically look decent. Something roughly like

class BackendEntryPoint:
    ...

    def __str__(self) -> str:
        txt += self.backend_description
        if self.backend_url:
            txt += f"\nFind more info at {self.backend_url}"
        return txt

What might be clearer is if this backend class knew it's own name, but I'm not sure it can, hmmm

TomNicholas avatar Sep 08 '22 15:09 TomNicholas

One approach might be to add a nice __str__, __repr__, (or even _repr_html_) to the BackendEntrypoint class. That way printing a list (or better a dict) of the engine-specific subclasses would automatically look decent.

That would not display in a nice table though. But might be anyway nice to add ;)

What might be clearer is if this backend class knew it's own name, but I'm not sure it can, hmmm

What about type(self).__name__

headtr1ck avatar Sep 08 '22 15:09 headtr1ck

That would not display in a nice table though. But might be anyway nice to add ;)

So what would the return type need to be to get a nice table representation, but also allow you to select out individual backend objects? A pandas object containing the backend objects? Some kind of BackendList class?

What about type(self).__name__

Correct me if I'm wrong, but if a subclassed ZarrBackend object was added as an entrypoint under the key "zarr", wouldn't type(self).__name__ return "ZarrBackend" rather than "zarr", even though "zarr" is what the user would actually have to pass to the engine kwarg of open_dataset to use that backend? Not ideal :/

TomNicholas avatar Sep 08 '22 16:09 TomNicholas

Correct me if I'm wrong, but if a subclassed ZarrBackend object was added as an entrypoint under the key "zarr", wouldn't type(self).__name__ return "ZarrBackend" rather than "zarr", even though "zarr" is what the user would actually have to pass to the engine kwarg of open_dataset to use that backend? Not ideal :/

You're right. But this mapping has to be somewhere.

headtr1ck avatar Sep 08 '22 16:09 headtr1ck

That would not display in a nice table though. But might be anyway nice to add ;)

Agreed. I added a __str__ to the BackendEntrypoint class, but...

So what would the return type need to be to get a nice table representation, but also allow you to select out individual backend objects? A pandas object containing the backend objects? Some kind of BackendList class?

... I haven't dreamed up an answer to this question yet. We could set it up so that __str__ instead returns a pandas series, but I don't see another way we can have a pretty __str__ that will also directly feed into a nicely formated table for avail_engines().

JessicaS11 avatar Sep 08 '22 20:09 JessicaS11

So what would the return type need to be to get a nice table representation, but also allow you to select out individual backend objects? A pandas object containing the backend objects? Some kind of BackendList class?

Does it need to be the same method? Why not a method that simply returns a list of classes and another method that prints a nicely formatted list?

headtr1ck avatar Sep 10 '22 13:09 headtr1ck

Agreed. I added a str to the BackendEntrypoint class, but...

This is helpful already, thank you.

Does it need to be the same method? Why not a method that simply returns a list of classes and another method that prints a nicely formatted list?

I guess not, but we also probably don't want to pollute xarray's public namespace with multiple functions that do basically the same thing. Though maybe as it's all behind the backends namespace that doesn't matter so much? Then we might have list_backends and display_backends or something?

TomNicholas avatar Sep 12 '22 13:09 TomNicholas

we also probably don't want to pollute xarray's public namespace with multiple functions that do basically the same thing. Though maybe as it's all behind the backends namespace that doesn't matter so much? Then we might have list_backends and display_backends or something?

Currently there is this method:

>>> xr.backends.plugins.list_engines()
{'scipy': <xarray.backends.scipy_.ScipyBackendEntrypoint object at 0x101ec4ee0>, 'rasterio': <rioxarray.xarray_plugin.RasterioBackend object at 0x10217de70>, 'store': <xarray.backends.store.StoreBackendEntrypoint object at 0x10217de10>}

and here we're adding (essentially to more effectively and prettily surface the above results to the user):

>>> xr.backends.api.avail_engines()
Engine   Description Documentation
scipy                             
rasterio                          
store

The goal that led us (@snowman2 @dcherian @scottyhq) to create the initial issue was giving the user an easy, obvious way to find out what engines were actually available to them in their current environment. In the process of figuring out how to do this, @snowman2 and I discovered that list_engines() already existed, we just needed to add the attributes we wanted to display, make it pretty, and put it somewhere a user might look for it. We can certainly change the name of avail_engines() to display_backends(). Is there then a dev need to have another version of list_backends(), or would a "see also" to list_engines() be a solution here?

JessicaS11 avatar Sep 12 '22 14:09 JessicaS11

We can certainly change the name of avail_engines() to display_backends(). Is there then a dev need to have another version of list_backends(), or would a "see also" to list_engines() be a solution here?

I think exposing a backends.display_backends() (defined in backends.api like BackendEntrypoint is) and then also pointing from there to the lower-level xr.backends.plugins.list_engines() would be great. That way there is only one more function listed in xarray's main API docs, and list_backends can be for dev use if necessary.

Expanding on that, we could put display_backends in the API docs under IO / conversion and add list_backends under Advanced API. That would keep distinct the two use cases of (1) "I'm a user and want to know what backends are installed for use as engine=...", and (2) "I'm a developer who is trying to add a new backend and I want to see all the actual BackendEntrypoint objects".

TomNicholas avatar Sep 12 '22 15:09 TomNicholas

Thanks Jessica!

"I'm a developer who is trying to add a new backend and I want to see all the actual BackendEntrypoint objects".

We could also just document using xr.backends.plugins.list_engines() in https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html and https://tutorial.xarray.dev/advanced/backends/backends.html

dcherian avatar Sep 12 '22 15:09 dcherian

>>> xr.backends.plugins.list_engines()
{'scipy': <xarray.backends.scipy_.ScipyBackendEntrypoint object at 0x101ec4ee0>, 'rasterio': <rioxarray.xarray_plugin.RasterioBackend object at 0x10217de70>, 'store': <xarray.backends.store.StoreBackendEntrypoint object at 0x10217de10>}

I wonder if these classes had a proper repr wouldn't that be enough?

Illviljan avatar Sep 12 '22 18:09 Illviljan

add list_backends under Advanced API

Turns out it was already available so I added it to the Advance API docs as well as https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html.

@dcherian I'll bookmark adding it to https://tutorial.xarray.dev/advanced/backends/backends.html.

@Illviljan, in case you didn't see this prior discussion:

So what would the return type need to be to get a nice table representation, but also allow you to select out individual backend objects? A pandas object containing the backend objects? Some kind of BackendList class?

... I haven't dreamed up an answer to this question yet. We could set it up so that __str__ instead returns a pandas series, but I don't see another way we can have a pretty __str__ that will also directly feed into a nicely formated table for avail_engines().

Do you have any suggestions?

Note: avail_engines() has been renamed to show_engines()

JessicaS11 avatar Sep 12 '22 18:09 JessicaS11

My 2 cents is that list_engines should be good enough and the issue is the lack of a good repr on the backend classes.

If the dict repr from list_engines is still not good enough maybe replacing the dict with a custom MutableMapping with a good repr format could be a way forward?

Illviljan avatar Sep 12 '22 21:09 Illviljan

I think show_engines provides a simple user friendly interface that is valuable for users in it's current state.

snowman2 avatar Sep 12 '22 21:09 snowman2

If the dict repr from list_engines is still not good enough maybe replacing the dict with a custom MutableMapping with a good repr format could be a way forward?

I like this idea more than using a pd.dataframe, which is quite a heavy dependency to use for just displaying some strings. (We have other plans to eventually make pandas an optional import too). That would also be consistent with the approach we use for ds.coords and similar.

EDIT: This would also be a good thing to split out into an (optional) future PR though, and merge returning a dict of objects with nice individual reprs for now.

TomNicholas avatar Sep 12 '22 21:09 TomNicholas

We have other plans to eventually make pandas an optional import too

That makes not using a data frame make a lot of sense.

snowman2 avatar Sep 12 '22 21:09 snowman2

We discussed this with Jessica in the xarray community meeting today and I think the conclusion was to return a dict of backend objects with nice reprs to begin with, and if there is appetite then prettify it later, likely with a custom mutable mapping with special repr.

TomNicholas avatar Sep 14 '22 17:09 TomNicholas

Does anyone know if the backentrypoints are supposed to have an available attribute? This: https://github.com/pydata/xarray/blob/212a5d7909e8dd54446b08574a0683e2477f2b40/xarray/backends/plugins.py#L89 seems to require one.

headtr1ck avatar Sep 25 '22 21:09 headtr1ck

Does anyone know if the backentrypoints are supposed to have an available attribute?

Looks like they are - every subclass of BackendEntryPoint seems to define .available. In each case it points to the corresponding has_[package] flag. I think it might be safe to set .available to default to True in the base class? That way if the backend requires no special dependencies it would by default return True. If we think that makes sense then I'm happy to add a commit doing that so this can be merged.

EDIT: (Actually we should wait to put tests in too really)

TomNicholas avatar Sep 28 '22 19:09 TomNicholas

We discussed this with Jessica in the xarray community meeting today

Did you discuss adding xr.show_engines() or are we pointing all users to xr.backends.list_engines()?

dcherian avatar Oct 03 '22 20:10 dcherian

Looks like they are - every subclass of BackendEntryPoint seems to define .available. In each case it points to the corresponding has_[package] flag. I think it might be safe to set .available to default to True in the base class? That way if the backend requires no special dependencies it would by default return True. If we think that makes sense then I'm happy to add a commit doing that so this can be merged.

EDIT: (Actually we should wait to put tests in too really)

In #7114 I have added this class attribute with default = True. Maybe we can merge this first?

Which tests did you want to see?

headtr1ck avatar Oct 04 '22 17:10 headtr1ck

@JessicaS11 if you merge master and solve the conflicts this should be enough for a merge :)

headtr1ck avatar Oct 11 '22 18:10 headtr1ck

Couple of minor comments, then we can merge.

headtr1ck avatar Oct 14 '22 06:10 headtr1ck

Couple of minor comments, then we can merge.

@headtr1ck GitHub is unhelpfully not showing me any comments. Can you share a screen shot or something so I can address them? Thanks!

JessicaS11 avatar Oct 14 '22 13:10 JessicaS11

@headtr1ck GitHub is unhelpfully not showing me any comments. Can you share a screen shot or something so I can address them? Thanks!

Forgot to click the ok button, haha

headtr1ck avatar Oct 14 '22 14:10 headtr1ck

@headtr1ck the typing changes didn't work. Can you send in a separate PR for that please?

dcherian avatar Oct 17 '22 16:10 dcherian

@headtr1ck the typing changes didn't work. Can you send in a separate PR for that please?

Ok, weird. But can do :)

headtr1ck avatar Oct 17 '22 16:10 headtr1ck