pystac-client
pystac-client copied to clipboard
Different numbers of STAC items returned between version `0.7.3` and `>=0.7.4`
Hi all, we've recently had users encounter issues with missing items returned via our Digital Earth Australia STAC API (https://explorer.sandbox.dea.ga.gov.au/stac). The data does exist, however only a small proportion of matching STAC items are being returned by pystac_client.
Looking into this, this issue appears to occur only on the most recent versions of pystac_client - version 0.7.3 and earlier return all relevant results for a query as expected.
pystac_client version 0.7.3
For example, on pystac_client==0.7.3, the query works perfectly and returns the expected 37 resulting Sentinel-2 scenes:
import pystac_client
client = pystac_client.Client.open("https://explorer.sandbox.dea.ga.gov.au/stac")
collections = ["ga_s2am_ard_3", "ga_s2bm_ard_3"]
query = client.search(
collections=collections,
bbox=[146.04, -34.30, 146.05, -34.28],
datetime="2023-12-01/2024-02-28",
)
len([i.properties["datetime"] for i in query.items()])
pystac_client version 0.7.4 and above
However, on pystac_client==0.7.4 and above, only 20 items are returned for exacly the same query:
(A workaround is to provide a high limit manually (e.g. limit=1000) - however this feels unnecessary and is not something our users have had to do in the past)
In case it's useful, our STAC API implementation is located here: https://github.com/opendatacube/datacube-explorer/blob/develop/cubedash/_stac.py
In v0.7.4 we removed the default limit from pystac-client (https://github.com/stac-utils/pystac-client/pull/584) because it makes more sense to trust the server's default limit. You noticed the change because pagination appears to be broken for your API:
$ cat data.json
{
"bbox": [
146.04,
-34.3,
146.05,
-34.28
],
"datetime": "2023-12-01T00:00:00Z/2024-02-28T23:59:59Z",
"collections": [
"ga_s2am_ard_3",
"ga_s2bm_ard_3"
]
}
$ curl -s -X POST https://explorer.sandbox.dea.ga.gov.au/stac/search --json @data.json | jq '.links[0]'
{
"rel": "next",
"href": "https://explorer.sandbox.dea.ga.gov.au/stac/search?collections=ga_s2am_ard_3&collections=ga_s2bm_ard_3&bbox=146.04,-34.3,146.05,-34.28&time=2023-12-01T00%3A00%3A00%2B00%3A00%2F2024-02-28T23%3A59%3A59%2B00%3A00&limit=20&_o=20&_full=True"
}
$ curl -s https://explorer.sandbox.dea.ga.gov.au/stac/search\?collections\=ga_s2am_ard_3\&collections\=ga_s2bm_ard_3\&bbox\=146.04,-34.3,146.05,-34.28\&time\=2023-12-01T00%3A00%3A00%2B00%3A00%2F2024-02-28T23%3A59%3A59%2B00%3A00\&limit\=20\&_o\=20\&_full\=True | jq '.features | length'
0
One note: it's surprising to me that the next link would be a GET url, when the original search request came in as a POST.
Closing as not-a-pystac-client-issue.