pystac-client icon indicating copy to clipboard operation
pystac-client copied to clipboard

Search is creating multiple http requests

Open chiarch84 opened this issue 1 year ago • 3 comments

When I perform this search through the Pystac client it seems that instead of sending 1 request to the APIs by calling the GET /search method, it first calls the landing page, and then getting all the collections ,and only then performing the search. I checked this by controlling the traffic going through the proxy and it seemed quite weird. In particular this causes problems when I try to do simultaneous searches from parallel processes since the web server feels a sort of attack.

catalog = Client.open("...")
my_search = catalog.search(
    max_items=100,
    collections=['EO.Copernicus.S2.L2A'],
    bbox = [11.2, 46.4, 11.4, 46.5],
    query={"eo:cloud_cover":{"lt":70}},
    datetime=['2023-01-01T00:00:00Z', '2023-01-02T00:00:00Z'],
    method='GET')

chiarch84 avatar Jan 12 '24 15:01 chiarch84

I think that initial request is needed to check the conformance classes the API publishes. You might be able to disable it with ignore_conformance=True. The docs have a bit of info on conformance: https://pystac-client.readthedocs.io/en/stable/usage.html.

On Fri, Jan 12, 2024 at 9:43 AM chiarch84 @.***> wrote:

When I perform this search through the Pystac client it seems that instead of sending 1 request to the APIs by calling the GET /search method, it first calls the landing page, and then getting all the collections ,and only then performing the search. I checked this by controlling the traffic going through the proxy and it seemed quite weird. In particular this causes problems when I try to do simultaneous searches from parallel processes wince the web server feels a sort of attack.

catalog = Client.open("...") my_search = catalog.search( max_items=100, collections=['EO.Copernicus.S2.L2A'], bbox = [11.2, 46.4, 11.4, 46.5], query={"eo:cloud_cover":{"lt":70}}, datetime=['2023-01-01T00:00:00Z', '2023-01-02T00:00:00Z'], method='GET')

— Reply to this email directly, view it on GitHub https://github.com/stac-utils/pystac-client/issues/627 or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIWOCSSSJKSAXKRJHNTYOFKYXBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJLJONZXKZNENZQW2ZNLORUHEZLBMRPXI6LQMWBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTLDTOVRGUZLDORPXI6LQMWSUS43TOVS2M5DPOBUWG44SQKSHI6LQMWVHEZLQN5ZWS5DPOJ42K5TBNR2WLKJTGQZTSOJQGUYTTAVEOR4XAZNFNFZXG5LFUV3GC3DVMWVDEMBXHEYTGNBRGEY2O5DSNFTWOZLSUZRXEZLBORSQ . You are receiving this email because you are subscribed to this thread.

Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

TomAugspurger avatar Jan 12 '24 15:01 TomAugspurger

this causes problems when I try to do simultaneous searches from parallel processes

That is a fair point. In the future it might make sense for pystac-client to make it easier to skip that initial GET.

In the meantime I took a look at the code and it doesn't look like ignore_conformance=True will skip the initial GET. That argument just makes it so the conformance classes are not considered when the user requests certain actions.

If you want to avoid any superfluous network calls, I would recommend skipping the Client object entirely and just using ItemSearch directly. Here is what that would look like:

from pystac_client import ItemSearch

search = ItemSearch(url="https://earth-search.aws.element84.com/v1/search", collections=["cop-dem-glo-30"], max_items=1)

Notice that the url ends in /search.

jsignell avatar Jan 15 '24 14:01 jsignell

Thanks for the suggestion! I will try!

chiarch84 avatar Jan 15 '24 16:01 chiarch84