cmr-stac
cmr-stac copied to clipboard
Search is very fast, but loading STAC Items is very slow
Performing a search can return >1000 items in less than 1 second. But loading those search results as STAC Items takes orders of magnitude longer (>1 min). Is this expected?
%%time
import pystac_client #0.3.0
URL = 'https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS'
catalog = pystac_client.Client.open(URL)
results = catalog.search(
collections=['NSIDC-0723.v4'],
bbox = '-54.85,69.31,-52.18,70.26',
datetime='2000-01-01/2021-12-31',
)
print(f"{results.matched()} items found") # 1387 items found, Wall time: 909 ms
%%time
items = results.get_all_items() # Wall time: 1min 5s
The initial matched() function is just getting the total number of hits from the numberMatched field in the response so it's making a single query for that.
When calling get_all_items it's actually paging through all the results so it can take a while. Also, by default it's probably only getting 10 items per page. You can probably speed this up by supplying limit=100 to search, this will get 100 items per page and make a lot fewer requests to get all items.
Thanks for the info @matthewhanson , I've not been able to get limit to work: https://github.com/nasa/cmr-stac/issues/202
But also, here is a quick comparison with about the same number of items (600), but different endpoints, that makes me think there might be different server-side defaults/performance differences?
%%time
URL = "https://earth-search.aws.element84.com/v0"
catalog = pystac_client.Client.open(URL)
results = catalog.search(
intersects=dict(type="Point", coordinates=[-105.78, 35.79]),
collections=["sentinel-s2-l2a-cogs"],
datetime="2010-04-01/2021-12-31"
)
print(f"{results.matched()} items found") # 604 items
stac_items = results.get_all_items() # Wall time: 4.72 s
%%time
URL = 'https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS'
catalog = pystac_client.Client.open(URL)
results = catalog.search(
collections=['NSIDC-0723.v4'],
bbox = '-54.85,69.31,-52.18,70.26',
datetime='2019-02-09/2021-12-31',
)
print(f"{results.matched()} items found") # 602 items
stac_items = results.get_all_items() # Wall time: 27.2 s