pystac icon indicating copy to clipboard operation
pystac copied to clipboard

Catalog.get_item(id) with non-unique IDs

Open matthewhanson opened this issue 3 years ago • 2 comments

According to the spec: https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md#item-fields

There can Items that share an ID across a Catalog. Collection inherits get_item which would work as intended (with recursive=True) since there will be only one matching Item. However, a Catalog would return the first matching Item it would find in the first collection it comes across.

This behavior seems reasonable as long as it is understood (I think a warning in documentation specifying the behavior). However I'm having problem deciding how get_item should work in pystac-client.Client. There it makes sense, if the API conforms to the item search Conformance Class), to instead use the /search endpoint to get the item for an ID rather than recursively go through the catalog, which it won't be able to do if it's not a browsing API.

In this case, there may be more than one Item returned. Should get_item return an array, making the response different than pystac.Catalog? Should it return a single item, ignoring any additional Items? Or should it remain untouched and users should instead use the Client.search function to search by ID?

matthewhanson avatar Jul 29 '21 05:07 matthewhanson

This behavior seems reasonable as long as it is understood (I think a warning in documentation specifying the behavior).

I think this would be a good approach for PySTAC. In the long run, we may actually want to remove this method from Catalog and only include it on Collection, since that is the only case where it is guaranteed to return a single Item. That would obviously be a breaking change, though, so it would need to wait for PySTAC v2.

There it makes sense, if the API conforms to the item search Conformance Class), to instead use the /search endpoint to get the item for an ID rather than recursively go through the catalog, which it won't be able to do if it's not a browsing API.

This seems like the best approach for pystac-client. I can't think of a reason why you would want to fall back to recursively crawling the catalog for an API landing page, since a conformant API should make the same Item available via the /search endpoint.

Should get_item return an array, making the response different than pystac.Catalog?

I think this is going to cause typing problems because it would violate the Liskov subtitution principle. It seems like the safest approach would be to return the first Item that has the given ID and add a warning similar to what we're talking about adding in PySTAC to make sure users are aware of the possible inconsistency in results.

duckontheweb avatar Jul 30 '21 01:07 duckontheweb

Fory PySTAC, I propose removing (well, deprecating then removing) Catalog.get_item in favor of Catalog.get_items(id: str) -> Iterator[Item] -- since there's no guarantee around the number of items with a given id in a Catalog, we shouldn't imply there is with a get_item method. We can document using next(get_item(id), None) for users that want to get the first item.

Collection should retain the get_item.

gadomski avatar Nov 08 '22 23:11 gadomski

I was just going to pick this up and I have a few questions.

  1. Should get_items accept multiple ids since the result will get an iterable anyways?
  2. Should there be a new recursive kwarg on get_items to support the old get_item(id, recursive=True)?
    • and if so does that mean that get_all_items should be deprecated as well in favor of get_items(recursive=True)?

jsignell avatar Mar 27 '23 18:03 jsignell

Should get_items accept multiple ids since the result will get an iterable anyways?

My instinct is no, but I could be talked out of it. My bias is to keep the function signature as simple as possible, and so I prefer def get_items(self, id: Optional[str] = None, recursive: bool = False) over ..., id: Optional[str | Iterable[str]], .... A user could still do list(item.id in ids for item in catalog.get_items()) if they want the in List functionality.

Should there be a new recursive kwarg on get_items to support the old get_item(id, recursive=True)?

Yes, I think so. We'll want to keep the "walk the whole Catalog" functionality.

and if so does that mean that get_all_items should be deprecated as well in favor of get_items(recursive=True)?

Yes? To me, having get_items and get_all_items violates "There should be one-- and preferably only one --obvious way to do it."

gadomski avatar Mar 30 '23 13:03 gadomski

Should get_items accept multiple ids since the result will get an iterable anyways?

My instinct is no, but I could be talked out of it. My bias is to keep the function signature as simple as possible, and so I prefer def get_items(self, id: Optional[str] = None, recursive: bool = False) over ..., id: Optional[str | Iterable[str]], .... A user could still do list(item.id in ids for item in catalog.get_items()) if they want the in List functionality.

Yeah I guess I was imagining something more like def get_items(self, *ids: str, recursive: bool = False) so that 1 id would feel just as easy. But I'm not super attached to the idea.

jsignell avatar Mar 30 '23 13:03 jsignell

def get_items(self, *ids: str, recursive: bool = False)

🤔 I'm not against this ... I don't often reach for *args but maybe this would be a good use for it. 👍🏽

gadomski avatar Mar 30 '23 13:03 gadomski

I am going to start on this today.

jsignell avatar Mar 31 '23 14:03 jsignell

For collections we are keeping get_item but not allowing recursive=True right? Because that would also violates the uniqueness constraint. Since a collection can contain a catalog or another collection.

jsignell avatar Mar 31 '23 15:03 jsignell

I guess by the same logic though get_item on catalog should be valid as long as recursive=False.

jsignell avatar Mar 31 '23 15:03 jsignell

For collections we are keeping get_item but not allowing recursive=True right?

Collection.get_item can be recursive, since (as you note) collections can contain catalogs or other collections. The uniqueness constraint can still hold even if there's nested catalogs/collection.

I guess by the same logic though get_item on catalog should be valid as long as recursive=False.

Not necessarily, two items could be at different hrefs but have the same id.

gadomski avatar Mar 31 '23 16:03 gadomski

Closed by #1075

jsignell avatar Apr 05 '23 14:04 jsignell