xarray
xarray copied to clipboard
list available backends and basic descriptors
Wanted to garner maintainer feedback on the basic implementation before I add any docs. I will need some guidance for adding tests.
- [x] Closes #6577
- [ ] Tests added
- [x] User visible changes (including notable bug fixes) are documented in
whats-new.rst
- [x] New functions/methods are listed in
api.rst
I know that it is more work but I think it would be more beneficial to have this information formated nicely than simply returning some lists. Maybe similar to a pandas dataframe?
I know that it is more work but I think it would be more beneficial to have this information formated nicely than simply returning some lists.
Sounds good - I'll make that happen!
Hey @JessicaS11! Nice to see you here. This is a going to be a great first-time contribution!
I know that it is more work but I think it would be more beneficial to have this information formated nicely than simply returning some lists. Maybe similar to a pandas dataframe?
One approach might be to add a nice __str__
, __repr__
, (or even _repr_html_
) to the BackendEntrypoint
class. That way printing a list (or better a dict) of the engine-specific subclasses would automatically look decent. Something roughly like
class BackendEntryPoint:
...
def __str__(self) -> str:
txt += self.backend_description
if self.backend_url:
txt += f"\nFind more info at {self.backend_url}"
return txt
What might be clearer is if this backend class knew it's own name, but I'm not sure it can, hmmm
One approach might be to add a nice
__str__
,__repr__
, (or even_repr_html_
) to theBackendEntrypoint
class. That way printing a list (or better a dict) of the engine-specific subclasses would automatically look decent.
That would not display in a nice table though. But might be anyway nice to add ;)
What might be clearer is if this backend class knew it's own name, but I'm not sure it can, hmmm
What about type(self).__name__
That would not display in a nice table though. But might be anyway nice to add ;)
So what would the return type need to be to get a nice table representation, but also allow you to select out individual backend objects? A pandas object containing the backend objects? Some kind of BackendList
class?
What about
type(self).__name__
Correct me if I'm wrong, but if a subclassed ZarrBackend
object was added as an entrypoint under the key "zarr"
, wouldn't type(self).__name__
return "ZarrBackend"
rather than "zarr"
, even though "zarr"
is what the user would actually have to pass to the engine
kwarg of open_dataset
to use that backend? Not ideal :/
Correct me if I'm wrong, but if a subclassed
ZarrBackend
object was added as an entrypoint under the key"zarr"
, wouldn'ttype(self).__name__
return"ZarrBackend"
rather than"zarr"
, even though"zarr"
is what the user would actually have to pass to theengine
kwarg ofopen_dataset
to use that backend? Not ideal :/
You're right. But this mapping has to be somewhere.
That would not display in a nice table though. But might be anyway nice to add ;)
Agreed. I added a __str__
to the BackendEntrypoint
class, but...
So what would the return type need to be to get a nice table representation, but also allow you to select out individual backend objects? A pandas object containing the backend objects? Some kind of BackendList class?
... I haven't dreamed up an answer to this question yet. We could set it up so that __str__
instead returns a pandas series, but I don't see another way we can have a pretty __str__
that will also directly feed into a nicely formated table for avail_engines()
.
So what would the return type need to be to get a nice table representation, but also allow you to select out individual backend objects? A pandas object containing the backend objects? Some kind of
BackendList
class?
Does it need to be the same method? Why not a method that simply returns a list of classes and another method that prints a nicely formatted list?
Agreed. I added a str to the BackendEntrypoint class, but...
This is helpful already, thank you.
Does it need to be the same method? Why not a method that simply returns a list of classes and another method that prints a nicely formatted list?
I guess not, but we also probably don't want to pollute xarray's public namespace with multiple functions that do basically the same thing. Though maybe as it's all behind the backends namespace that doesn't matter so much? Then we might have list_backends
and display_backends
or something?
we also probably don't want to pollute xarray's public namespace with multiple functions that do basically the same thing. Though maybe as it's all behind the backends namespace that doesn't matter so much? Then we might have
list_backends
anddisplay_backends
or something?
Currently there is this method:
>>> xr.backends.plugins.list_engines()
{'scipy': <xarray.backends.scipy_.ScipyBackendEntrypoint object at 0x101ec4ee0>, 'rasterio': <rioxarray.xarray_plugin.RasterioBackend object at 0x10217de70>, 'store': <xarray.backends.store.StoreBackendEntrypoint object at 0x10217de10>}
and here we're adding (essentially to more effectively and prettily surface the above results to the user):
>>> xr.backends.api.avail_engines()
Engine Description Documentation
scipy
rasterio
store
The goal that led us (@snowman2 @dcherian @scottyhq) to create the initial issue was giving the user an easy, obvious way to find out what engines were actually available to them in their current environment. In the process of figuring out how to do this, @snowman2 and I discovered that list_engines()
already existed, we just needed to add the attributes we wanted to display, make it pretty, and put it somewhere a user might look for it. We can certainly change the name of avail_engines()
to display_backends()
. Is there then a dev need to have another version of list_backends()
, or would a "see also" to list_engines()
be a solution here?
We can certainly change the name of avail_engines() to display_backends(). Is there then a dev need to have another version of list_backends(), or would a "see also" to list_engines() be a solution here?
I think exposing a backends.display_backends()
(defined in backends.api
like BackendEntrypoint
is) and then also pointing from there to the lower-level xr.backends.plugins.list_engines()
would be great. That way there is only one more function listed in xarray's main API docs, and list_backends
can be for dev use if necessary.
Expanding on that, we could put display_backends
in the API docs under IO / conversion
and add list_backends
under Advanced API
. That would keep distinct the two use cases of (1) "I'm a user and want to know what backends are installed for use as engine=...
", and (2) "I'm a developer who is trying to add a new backend and I want to see all the actual BackendEntrypoint
objects".
Thanks Jessica!
"I'm a developer who is trying to add a new backend and I want to see all the actual BackendEntrypoint objects".
We could also just document using xr.backends.plugins.list_engines()
in https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html and https://tutorial.xarray.dev/advanced/backends/backends.html
>>> xr.backends.plugins.list_engines()
{'scipy': <xarray.backends.scipy_.ScipyBackendEntrypoint object at 0x101ec4ee0>, 'rasterio': <rioxarray.xarray_plugin.RasterioBackend object at 0x10217de70>, 'store': <xarray.backends.store.StoreBackendEntrypoint object at 0x10217de10>}
I wonder if these classes had a proper repr wouldn't that be enough?
add
list_backends
underAdvanced API
Turns out it was already available so I added it to the Advance API docs as well as https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html.
@dcherian I'll bookmark adding it to https://tutorial.xarray.dev/advanced/backends/backends.html.
@Illviljan, in case you didn't see this prior discussion:
So what would the return type need to be to get a nice table representation, but also allow you to select out individual backend objects? A pandas object containing the backend objects? Some kind of BackendList class?
... I haven't dreamed up an answer to this question yet. We could set it up so that
__str__
instead returns a pandas series, but I don't see another way we can have a pretty__str__
that will also directly feed into a nicely formated table foravail_engines()
.
Do you have any suggestions?
Note: avail_engines()
has been renamed to show_engines()
My 2 cents is that list_engines
should be good enough and the issue is the lack of a good repr on the backend classes.
If the dict repr from list_engines
is still not good enough maybe replacing the dict with a custom MutableMapping with a good repr format could be a way forward?
I think show_engines
provides a simple user friendly interface that is valuable for users in it's current state.
If the dict repr from list_engines is still not good enough maybe replacing the dict with a custom MutableMapping with a good repr format could be a way forward?
I like this idea more than using a pd.dataframe
, which is quite a heavy dependency to use for just displaying some strings. (We have other plans to eventually make pandas an optional import too). That would also be consistent with the approach we use for ds.coords
and similar.
EDIT: This would also be a good thing to split out into an (optional) future PR though, and merge returning a dict of objects with nice individual reprs for now.
We have other plans to eventually make pandas an optional import too
That makes not using a data frame make a lot of sense.
We discussed this with Jessica in the xarray community meeting today and I think the conclusion was to return a dict of backend objects with nice reprs to begin with, and if there is appetite then prettify it later, likely with a custom mutable mapping with special repr.
Does anyone know if the backentrypoints are supposed to have an available
attribute?
This:
https://github.com/pydata/xarray/blob/212a5d7909e8dd54446b08574a0683e2477f2b40/xarray/backends/plugins.py#L89
seems to require one.
Does anyone know if the backentrypoints are supposed to have an available attribute?
Looks like they are - every subclass of BackendEntryPoint
seems to define .available
. In each case it points to the corresponding has_[package]
flag. I think it might be safe to set .available
to default to True
in the base class? That way if the backend requires no special dependencies it would by default return True. If we think that makes sense then I'm happy to add a commit doing that so this can be merged.
EDIT: (Actually we should wait to put tests in too really)
We discussed this with Jessica in the xarray community meeting today
Did you discuss adding xr.show_engines()
or are we pointing all users to xr.backends.list_engines()
?
Looks like they are - every subclass of
BackendEntryPoint
seems to define.available
. In each case it points to the correspondinghas_[package]
flag. I think it might be safe to set.available
to default toTrue
in the base class? That way if the backend requires no special dependencies it would by default return True. If we think that makes sense then I'm happy to add a commit doing that so this can be merged.EDIT: (Actually we should wait to put tests in too really)
In #7114 I have added this class attribute with default = True. Maybe we can merge this first?
Which tests did you want to see?
@JessicaS11 if you merge master and solve the conflicts this should be enough for a merge :)
Couple of minor comments, then we can merge.
Couple of minor comments, then we can merge.
@headtr1ck GitHub is unhelpfully not showing me any comments. Can you share a screen shot or something so I can address them? Thanks!
@headtr1ck GitHub is unhelpfully not showing me any comments. Can you share a screen shot or something so I can address them? Thanks!
Forgot to click the ok button, haha
@headtr1ck the typing changes didn't work. Can you send in a separate PR for that please?
@headtr1ck the typing changes didn't work. Can you send in a separate PR for that please?
Ok, weird. But can do :)