pystac-client icon indicating copy to clipboard operation
pystac-client copied to clipboard

search result depends on datetime format

Open johntruckenbrodt opened this issue 1 year ago • 5 comments

Hi. I just observed some unexpected behavior when comparing the search result from queries with datetime as string and as datetime.datetime object. Can you help me out here?

The first query returns 20 scenes in the defined time interval. The second query only returns 7 scenes outside the interval.

from datetime import datetime
from pystac_client import Client


def get_href(item):
    assets = item.assets
    ref = assets[list(assets.keys())[0]]
    return ref.href


################################################################
url = 'https://planetarycomputer.microsoft.com/api/stac/v1'
collections = ['sentinel-1-grd']

acq_start = '2022-02-02T12:00:00'
acq_end = '2022-02-02T12:30:00'

pattern = '%Y-%m-%dT%H:%M:%S'
acq_start_dt = datetime.strptime(acq_start, pattern)
acq_end_dt = datetime.strptime(acq_end, pattern)

catalog = Client.open(url)
################################################################
# query 1: datetime as str
result = catalog.search(collections=collections, max_items=None,
                        datetime=[acq_start, acq_end])
result = list(result.items())
print(f'got {len(result)} scenes')
scenes = [get_href(x) for x in result]
print('\n'.join(scenes))
################################################################
# query 2: datetime as datetime.datetime
result = catalog.search(collections=collections, max_items=None,
                        datetime=[acq_start_dt, acq_end_dt])
result = list(result.items())
print(f'got {len(result)} scenes')
scenes = [get_href(x) for x in result]
print('\n'.join(scenes))

johntruckenbrodt avatar Feb 21 '24 11:02 johntruckenbrodt

I'm not sure if it's the cause of the observed difference, but pystac-client will "pad" the end datetime when you pass a string. See the datetime parameter at https://pystac-client.readthedocs.io/en/stable/api.html#item-search.

I wonder what you get if you set pattern = "Y-%m-%dT%H:%M:%S.%f" or "Y-%m-%dT%H:%M:%S.999"

TomAugspurger avatar Feb 21 '24 12:02 TomAugspurger

Thanks a lot @TomAugspurger for helping to solve this. I changed the strings and pattern but get the same result.
In the second case, the time is shifted to one hour earlier as can be seen when calling ItemSearch.url_with_parameters:

...datetime=2022-02-02T12%3A00%3A00Z%2F2022-02-02T12%3A30%3A00Z...
...datetime=2022-02-02T11%3A00%3A00Z%2F2022-02-02T11%3A30%3A00Z...

johntruckenbrodt avatar Feb 21 '24 12:02 johntruckenbrodt

Okay so the problem is in ItemSearch._to_utc_isoformat. The following line converts all datetime.datetime objects to UTC: https://github.com/stac-utils/pystac-client/blob/e50483de27a2ec0c4fb19961019fc5ac2451da46/pystac_client/item_search.py#L442 I guess in my case the one hour difference comes from the difference between UTC and CET where I am.

>>> from datetime import datetime, timezone
>>> acq_start = '2022-02-02T12:00:00'
>>> pattern = '%Y-%m-%dT%H:%M:%S'
>>> acq_start_dt = datetime.strptime(acq_start, pattern)
>>> acq_start_dt_utc = acq_start_dt.astimezone(timezone.utc)
>>> print(acq_start_dt_utc)
2022-02-02 11:00:00+00:00

johntruckenbrodt avatar Feb 21 '24 13:02 johntruckenbrodt

Ah, I was going to suggest timezones as a potential issue, but I misread the docs. I thought the docs indicated tz-naive timezones would be interpreted as UTC (i.e. dt.replace(tzinfo=datetime.timezone.utc) in code) rather than converted from your locale to UTC.

Converting timezone-naive, rather than interpreting them as UTC, is a bit surprising to me. I'd be open to deprecating the current behavior unless there's strong precedence elsewhere in the stac-utils ecosystem around how tz-naive datetimes are treated.

On Wed, Feb 21, 2024 at 7:12 AM John Truckenbrodt @.***> wrote:

Okay so the problem is in ItemSearch._to_utc_isoformat. The following line converts all datetime.datetime objects to UTC:

https://github.com/stac-utils/pystac-client/blob/e50483de27a2ec0c4fb19961019fc5ac2451da46/pystac_client/item_search.py#L442 I guess in my case the one hour difference comes from the difference between UTC and CET where I am.

from datetime import datetime>>> acq_start = '2022-02-02T12:00:00'>>> pattern = '%Y-%m-%dT%H:%M:%S'>>> acq_start_dt = datetime.strptime(acq_start, pattern)>>> acq_start_dt_utc = acq_start_dt.astimezone(timezone.utc)>>> print(acq_start_dt_utc)2022-02-02 11:00:00+00:00

— Reply to this email directly, view it on GitHub https://github.com/stac-utils/pystac-client/issues/644#issuecomment-1956622629 or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIRNGCUDVDQLAMILCMLYUXXDNBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVEZTIMZZHEYDKMJZQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGE2DMNJUGQ4DGNVHORZGSZ3HMVZKMY3SMVQXIZI . You are receiving this email because you commented on the thread.

Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

TomAugspurger avatar Feb 21 '24 15:02 TomAugspurger

I'd be open to deprecating the current behavior unless there's strong precedence elsewhere in the stac-utils ecosystem around how tz-naive datetimes are treated.

There's not a "usual" behavior as far as I know, so changing pystac-client to do the least surprising thing gets a 👍🏼 from me.

gadomski avatar Feb 21 '24 16:02 gadomski

🤔 I dug into this a bit, and I think pystac-client is doing "the right thing", i.e. interpreting strings w/o timezone information as UTC: https://github.com/stac-utils/pystac-client/blob/df117b65b0ee72d64a135d6ebf96f6f82dd83261/pystac_client/item_search.py#L475-L498

@johntruckenbrodt I think two different results are correct for your original post, since acq_start_dt = datetime.strptime(acq_start, pattern) is creating a datetime in your local TZ, which will be different than how pystac-client interprets the string.

gadomski avatar May 09 '24 12:05 gadomski

Ah I see. Wouldn't it then be more intuitive to assume the same time zone for strings and datetime objects if not specified by the user?
From how I interpret this, the time zone interpretation for unaware objects is up to the application.

Currently, the documentation of ItemSearch reads:

Instances of datetime.datetime may be either timezone aware or unaware. Timezone aware instances will be converted to a UTC timestamp before being passed to the endpoint. Timezone unaware instances are assumed to represent UTC timestamps.

However, datetime.strptime(acq_start, pattern) is creating an unaware object and the object is still converted to UTC.

johntruckenbrodt avatar May 14 '24 12:05 johntruckenbrodt

@johntruckenbrodt right you are — thanks for sticking with it. I agree, #686 should fix -- LMK if that matches your understanding.

gadomski avatar May 14 '24 17:05 gadomski

Great! IMO this fixes it. Thanks a lot.

johntruckenbrodt avatar May 15 '24 06:05 johntruckenbrodt