Search is creating multiple http requests
When I perform this search through the Pystac client it seems that instead of sending 1 request to the APIs by calling the GET /search method, it first calls the landing page, and then getting all the collections ,and only then performing the search. I checked this by controlling the traffic going through the proxy and it seemed quite weird. In particular this causes problems when I try to do simultaneous searches from parallel processes since the web server feels a sort of attack.
catalog = Client.open("...")
my_search = catalog.search(
max_items=100,
collections=['EO.Copernicus.S2.L2A'],
bbox = [11.2, 46.4, 11.4, 46.5],
query={"eo:cloud_cover":{"lt":70}},
datetime=['2023-01-01T00:00:00Z', '2023-01-02T00:00:00Z'],
method='GET')
I think that initial request is needed to check the conformance classes the
API publishes. You might be able to disable it with
ignore_conformance=True. The docs have a bit of info on conformance:
https://pystac-client.readthedocs.io/en/stable/usage.html.
On Fri, Jan 12, 2024 at 9:43 AM chiarch84 @.***> wrote:
When I perform this search through the Pystac client it seems that instead of sending 1 request to the APIs by calling the GET /search method, it first calls the landing page, and then getting all the collections ,and only then performing the search. I checked this by controlling the traffic going through the proxy and it seemed quite weird. In particular this causes problems when I try to do simultaneous searches from parallel processes wince the web server feels a sort of attack.
catalog = Client.open("...") my_search = catalog.search( max_items=100, collections=['EO.Copernicus.S2.L2A'], bbox = [11.2, 46.4, 11.4, 46.5], query={"eo:cloud_cover":{"lt":70}}, datetime=['2023-01-01T00:00:00Z', '2023-01-02T00:00:00Z'], method='GET')
— Reply to this email directly, view it on GitHub https://github.com/stac-utils/pystac-client/issues/627 or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIWOCSSSJKSAXKRJHNTYOFKYXBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJLJONZXKZNENZQW2ZNLORUHEZLBMRPXI6LQMWBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTLDTOVRGUZLDORPXI6LQMWSUS43TOVS2M5DPOBUWG44SQKSHI6LQMWVHEZLQN5ZWS5DPOJ42K5TBNR2WLKJTGQZTSOJQGUYTTAVEOR4XAZNFNFZXG5LFUV3GC3DVMWVDEMBXHEYTGNBRGEY2O5DSNFTWOZLSUZRXEZLBORSQ . You are receiving this email because you are subscribed to this thread.
Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .
this causes problems when I try to do simultaneous searches from parallel processes
That is a fair point. In the future it might make sense for pystac-client to make it easier to skip that initial GET.
In the meantime I took a look at the code and it doesn't look like ignore_conformance=True will skip the initial GET. That argument just makes it so the conformance classes are not considered when the user requests certain actions.
If you want to avoid any superfluous network calls, I would recommend skipping the Client object entirely and just using ItemSearch directly. Here is what that would look like:
from pystac_client import ItemSearch
search = ItemSearch(url="https://earth-search.aws.element84.com/v1/search", collections=["cop-dem-glo-30"], max_items=1)
Notice that the url ends in /search.
Thanks for the suggestion! I will try!