pystac
pystac copied to clipboard
Catalog.get_item(id) with non-unique IDs
According to the spec: https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md#item-fields
There can Items that share an ID across a Catalog. Collection inherits get_item
which would work as intended (with recursive=True) since there will be only one matching Item. However, a Catalog would return the first matching Item it would find in the first collection it comes across.
This behavior seems reasonable as long as it is understood (I think a warning in documentation specifying the behavior). However I'm having problem deciding how get_item
should work in pystac-client.Client. There it makes sense, if the API conforms to the item search Conformance Class), to instead use the /search endpoint to get the item for an ID rather than recursively go through the catalog, which it won't be able to do if it's not a browsing API.
In this case, there may be more than one Item returned. Should get_item return an array, making the response different than pystac.Catalog? Should it return a single item, ignoring any additional Items? Or should it remain untouched and users should instead use the Client.search function to search by ID?
This behavior seems reasonable as long as it is understood (I think a warning in documentation specifying the behavior).
I think this would be a good approach for PySTAC. In the long run, we may actually want to remove this method from Catalog
and only include it on Collection
, since that is the only case where it is guaranteed to return a single Item
. That would obviously be a breaking change, though, so it would need to wait for PySTAC v2.
There it makes sense, if the API conforms to the item search Conformance Class), to instead use the /search endpoint to get the item for an ID rather than recursively go through the catalog, which it won't be able to do if it's not a browsing API.
This seems like the best approach for pystac-client
. I can't think of a reason why you would want to fall back to recursively crawling the catalog for an API landing page, since a conformant API should make the same Item available via the /search
endpoint.
Should get_item return an array, making the response different than pystac.Catalog?
I think this is going to cause typing problems because it would violate the Liskov subtitution principle. It seems like the safest approach would be to return the first Item
that has the given ID and add a warning similar to what we're talking about adding in PySTAC to make sure users are aware of the possible inconsistency in results.
Fory PySTAC, I propose removing (well, deprecating then removing) Catalog.get_item
in favor of Catalog.get_items(id: str) -> Iterator[Item]
-- since there's no guarantee around the number of items with a given id in a Catalog, we shouldn't imply there is with a get_item
method. We can document using next(get_item(id), None)
for users that want to get the first item.
Collection should retain the get_item
.
I was just going to pick this up and I have a few questions.
- Should
get_items
accept multiple ids since the result will get an iterable anyways? - Should there be a new
recursive
kwarg onget_items
to support the oldget_item(id, recursive=True)
?- and if so does that mean that
get_all_items
should be deprecated as well in favor ofget_items(recursive=True)
?
- and if so does that mean that
Should get_items accept multiple ids since the result will get an iterable anyways?
My instinct is no, but I could be talked out of it. My bias is to keep the function signature as simple as possible, and so I prefer def get_items(self, id: Optional[str] = None, recursive: bool = False)
over ..., id: Optional[str | Iterable[str]], ...
. A user could still do list(item.id in ids for item in catalog.get_items())
if they want the in List
functionality.
Should there be a new recursive kwarg on get_items to support the old get_item(id, recursive=True)?
Yes, I think so. We'll want to keep the "walk the whole Catalog" functionality.
and if so does that mean that get_all_items should be deprecated as well in favor of get_items(recursive=True)?
Yes? To me, having get_items
and get_all_items
violates "There should be one-- and preferably only one --obvious way to do it."
Should get_items accept multiple ids since the result will get an iterable anyways?
My instinct is no, but I could be talked out of it. My bias is to keep the function signature as simple as possible, and so I prefer
def get_items(self, id: Optional[str] = None, recursive: bool = False)
over..., id: Optional[str | Iterable[str]], ...
. A user could still dolist(item.id in ids for item in catalog.get_items())
if they want thein List
functionality.
Yeah I guess I was imagining something more like def get_items(self, *ids: str, recursive: bool = False)
so that 1 id would feel just as easy. But I'm not super attached to the idea.
def get_items(self, *ids: str, recursive: bool = False)
🤔 I'm not against this ... I don't often reach for *args
but maybe this would be a good use for it. 👍🏽
I am going to start on this today.
For collections we are keeping get_item
but not allowing recursive=True
right? Because that would also violates the uniqueness constraint. Since a collection can contain a catalog or another collection.
I guess by the same logic though get_item
on catalog should be valid as long as recursive=False
.
For collections we are keeping get_item but not allowing recursive=True right?
Collection.get_item
can be recursive, since (as you note) collections can contain catalogs or other collections. The uniqueness constraint can still hold even if there's nested catalogs/collection.
I guess by the same logic though get_item on catalog should be valid as long as recursive=False.
Not necessarily, two items could be at different hrefs but have the same id.
Closed by #1075