cmr-stac icon indicating copy to clipboard operation
cmr-stac copied to clipboard

Search is very fast, but loading STAC Items is very slow

Open scottyhq opened this issue 3 years ago • 2 comments

Performing a search can return >1000 items in less than 1 second. But loading those search results as STAC Items takes orders of magnitude longer (>1 min). Is this expected?

%%time
import pystac_client #0.3.0

URL = 'https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS'
catalog = pystac_client.Client.open(URL)

results = catalog.search(
                 collections=['NSIDC-0723.v4'], 
                 bbox = '-54.85,69.31,-52.18,70.26',
                 datetime='2000-01-01/2021-12-31', 
                )

print(f"{results.matched()} items found") # 1387 items found, Wall time: 909 ms
%%time
items = results.get_all_items() # Wall time: 1min 5s

scottyhq avatar Nov 11 '21 21:11 scottyhq

The initial matched() function is just getting the total number of hits from the numberMatched field in the response so it's making a single query for that.

When calling get_all_items it's actually paging through all the results so it can take a while. Also, by default it's probably only getting 10 items per page. You can probably speed this up by supplying limit=100 to search, this will get 100 items per page and make a lot fewer requests to get all items.

matthewhanson avatar Nov 11 '21 21:11 matthewhanson

Thanks for the info @matthewhanson , I've not been able to get limit to work: https://github.com/nasa/cmr-stac/issues/202

But also, here is a quick comparison with about the same number of items (600), but different endpoints, that makes me think there might be different server-side defaults/performance differences?

%%time

URL = "https://earth-search.aws.element84.com/v0"
catalog = pystac_client.Client.open(URL)

results = catalog.search(
    intersects=dict(type="Point", coordinates=[-105.78, 35.79]),
    collections=["sentinel-s2-l2a-cogs"],
    datetime="2010-04-01/2021-12-31"
)

print(f"{results.matched()} items found") # 604 items
stac_items = results.get_all_items() # Wall time: 4.72 s
%%time

URL = 'https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS'
catalog = pystac_client.Client.open(URL)

results = catalog.search(
                 collections=['NSIDC-0723.v4'], 
                 bbox = '-54.85,69.31,-52.18,70.26',
                 datetime='2019-02-09/2021-12-31', 
                )

print(f"{results.matched()} items found") # 602 items
stac_items = results.get_all_items() # Wall time: 27.2 s

scottyhq avatar Nov 11 '21 22:11 scottyhq