pystac-client
pystac-client copied to clipboard
search result depends on datetime format
Hi. I just observed some unexpected behavior when comparing the search result from queries with datetime
as string and as datetime.datetime
object. Can you help me out here?
The first query returns 20 scenes in the defined time interval. The second query only returns 7 scenes outside the interval.
from datetime import datetime
from pystac_client import Client
def get_href(item):
assets = item.assets
ref = assets[list(assets.keys())[0]]
return ref.href
################################################################
url = 'https://planetarycomputer.microsoft.com/api/stac/v1'
collections = ['sentinel-1-grd']
acq_start = '2022-02-02T12:00:00'
acq_end = '2022-02-02T12:30:00'
pattern = '%Y-%m-%dT%H:%M:%S'
acq_start_dt = datetime.strptime(acq_start, pattern)
acq_end_dt = datetime.strptime(acq_end, pattern)
catalog = Client.open(url)
################################################################
# query 1: datetime as str
result = catalog.search(collections=collections, max_items=None,
datetime=[acq_start, acq_end])
result = list(result.items())
print(f'got {len(result)} scenes')
scenes = [get_href(x) for x in result]
print('\n'.join(scenes))
################################################################
# query 2: datetime as datetime.datetime
result = catalog.search(collections=collections, max_items=None,
datetime=[acq_start_dt, acq_end_dt])
result = list(result.items())
print(f'got {len(result)} scenes')
scenes = [get_href(x) for x in result]
print('\n'.join(scenes))
I'm not sure if it's the cause of the observed difference, but pystac-client will "pad" the end datetime when you pass a string. See the datetime
parameter at https://pystac-client.readthedocs.io/en/stable/api.html#item-search.
I wonder what you get if you set pattern = "Y-%m-%dT%H:%M:%S.%f"
or "Y-%m-%dT%H:%M:%S.999"
Thanks a lot @TomAugspurger for helping to solve this. I changed the strings and pattern but get the same result.
In the second case, the time is shifted to one hour earlier as can be seen when calling ItemSearch.url_with_parameters
:
...datetime=2022-02-02T12%3A00%3A00Z%2F2022-02-02T12%3A30%3A00Z...
...datetime=2022-02-02T11%3A00%3A00Z%2F2022-02-02T11%3A30%3A00Z...
Okay so the problem is in ItemSearch._to_utc_isoformat
. The following line converts all datetime.datetime
objects to UTC:
https://github.com/stac-utils/pystac-client/blob/e50483de27a2ec0c4fb19961019fc5ac2451da46/pystac_client/item_search.py#L442
I guess in my case the one hour difference comes from the difference between UTC and CET where I am.
>>> from datetime import datetime, timezone
>>> acq_start = '2022-02-02T12:00:00'
>>> pattern = '%Y-%m-%dT%H:%M:%S'
>>> acq_start_dt = datetime.strptime(acq_start, pattern)
>>> acq_start_dt_utc = acq_start_dt.astimezone(timezone.utc)
>>> print(acq_start_dt_utc)
2022-02-02 11:00:00+00:00
Ah, I was going to suggest timezones as a potential issue, but I misread
the docs. I thought the docs indicated tz-naive timezones would be
interpreted as UTC (i.e. dt.replace(tzinfo=datetime.timezone.utc)
in
code) rather than converted from your locale to UTC.
Converting timezone-naive, rather than interpreting them as UTC, is a bit surprising to me. I'd be open to deprecating the current behavior unless there's strong precedence elsewhere in the stac-utils ecosystem around how tz-naive datetimes are treated.
On Wed, Feb 21, 2024 at 7:12 AM John Truckenbrodt @.***> wrote:
Okay so the problem is in ItemSearch._to_utc_isoformat. The following line converts all datetime.datetime objects to UTC:
https://github.com/stac-utils/pystac-client/blob/e50483de27a2ec0c4fb19961019fc5ac2451da46/pystac_client/item_search.py#L442 I guess in my case the one hour difference comes from the difference between UTC and CET where I am.
from datetime import datetime>>> acq_start = '2022-02-02T12:00:00'>>> pattern = '%Y-%m-%dT%H:%M:%S'>>> acq_start_dt = datetime.strptime(acq_start, pattern)>>> acq_start_dt_utc = acq_start_dt.astimezone(timezone.utc)>>> print(acq_start_dt_utc)2022-02-02 11:00:00+00:00
— Reply to this email directly, view it on GitHub https://github.com/stac-utils/pystac-client/issues/644#issuecomment-1956622629 or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIRNGCUDVDQLAMILCMLYUXXDNBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVEZTIMZZHEYDKMJZQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGE2DMNJUGQ4DGNVHORZGSZ3HMVZKMY3SMVQXIZI . You are receiving this email because you commented on the thread.
Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .
I'd be open to deprecating the current behavior unless there's strong precedence elsewhere in the stac-utils ecosystem around how tz-naive datetimes are treated.
There's not a "usual" behavior as far as I know, so changing pystac-client to do the least surprising thing gets a 👍🏼 from me.
🤔 I dug into this a bit, and I think pystac-client is doing "the right thing", i.e. interpreting strings w/o timezone information as UTC: https://github.com/stac-utils/pystac-client/blob/df117b65b0ee72d64a135d6ebf96f6f82dd83261/pystac_client/item_search.py#L475-L498
@johntruckenbrodt I think two different results are correct for your original post, since acq_start_dt = datetime.strptime(acq_start, pattern)
is creating a datetime in your local TZ, which will be different than how pystac-client interprets the string.
Ah I see. Wouldn't it then be more intuitive to assume the same time zone for strings and datetime objects if not specified by the user?
From how I interpret this, the time zone interpretation for unaware objects is up to the application.
Currently, the documentation of ItemSearch
reads:
Instances of
datetime.datetime
may be either timezone aware or unaware. Timezone aware instances will be converted to a UTC timestamp before being passed to the endpoint. Timezone unaware instances are assumed to represent UTC timestamps.
However, datetime.strptime(acq_start, pattern)
is creating an unaware object and the object is still converted to UTC.
@johntruckenbrodt right you are — thanks for sticking with it. I agree, #686 should fix -- LMK if that matches your understanding.
Great! IMO this fixes it. Thanks a lot.