Pass arbitrary options to sel()
Is your feature request related to a problem?
Currently .sel() accepts two options method and tolerance. These are relevant for default (pandas) indexes but not necessarily for other, custom indexes.
It would be also useful for custom indexes to expose their own selection options, e.g.,
- index query optimization like the
dualtreeflag of sklearn.neighbors.KDTree.query - k-nearest neighbors selection with the creation of a new "k" dimension (+ coordinate / index) with user-defined name and size.
From #3223, it would be nice if we could also pass distinct options values per index.
What would be a good API for that?
Describe the solution you'd like
Some ideas:
A. Allow passing a tuple (labels, options_dict) as indexer value
ds.sel(x=([0, 2], {"method": "nearest"}), y=3)
B. Expose an options kwarg that would accept a nested dict
ds.sel(x=[0, 2], y=3, options={"x": {"method": "nearest"}})
Option A does not look very readable. Option B is slightly better, although the nested dictionary is not great.
Any other ideas? Some sort of context manager? Some Index specific API?
Describe alternatives you've considered
The API proposed in #3223 would look great if method and tolerance were the only accepted options, but less so for arbitrary options.
Additional context
No response
Another difficulty regarding multi-coordinate indexes: ideally options should be set per index, not per coordinate.
Or we could simply decide that .sel() should not accept arbitrary options and handle special cases, e.g., via accessors.
It would actually make sense to have something like .my_accessor.sel_k_neighbors(). Not so great to have a separate method just for an optimization option, though.
another option would be to allow passing a custom object, like
class Indexer:
def __init__(self, indexer, **options):
...
ds.sel(x=Indexer([0, 2], method="nearest"))
I think we wanted to have something like that, anyways, to be able to specify other behaviors of a slice, like right-exclusive?
Or use Indexer objects to group labels + options? This is slightly different than what you suggest:
class Dataset:
def sel(
self,
indexers: Mapping[Any, Any] | Indexer | Iterable[Indexer],
**indexers_kwargs: Any,
):
...
class Indexer:
def __init__(self, labels=None, options=None, **label_kwargs):
...
Let's assume a Dataset with lat / lon coordinates both sharing the same geographic index + another time dimension coordinate, then we could write:
indexers = [
Indexer(lon=[2, 15], lat=[45, 48], options={"foo": "bar"}),
Indexer(time="2022-01-01"),
]
ds.sel(indexers)
This could also be used to avoid code duplication when using common selection options for different indexes.