pvlib-python icon indicating copy to clipboard operation
pvlib-python copied to clipboard

Handle time normalization for nonexistent and ambiguous times

Open scttnlsn opened this issue 1 year ago • 10 comments

  • [x] Closes #2132
  • [x] I am familiar with the contributing guidelines
  • [x] Tests added
  • [x] Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
  • [x] Pull request is nearly complete and ready for detailed review.
  • [x] Maintainer: Appropriate GitHub Labels (including remote-data) and Milestone are assigned to the Pull Request and linked Issue.

scttnlsn avatar Jul 15 '24 19:07 scttnlsn

Started running the code from this branch on a real problem and realized the handling of ambiguous='infer' is not sufficient. The issue can be reproduced in the following test case:

eot = np.array([-3.935172, -4.117227, -4.026295])
longitude = 82.3666
times = pd.DatetimeIndex([
    '2014-11-02 09:00:00',
    '2014-11-02 10:00:00',
    '2014-11-02 11:00:00',
]).tz_localize('America/Havana')
solarposition.hour_angle(times, longitude, eot)

We get:

pytz.exceptions.AmbiguousTimeError: There are 2 dst switches when there should only be 1.

I think the infer piece requires exactly 2 duplicate times but we may have more in our times array. Not sure what the best solution is at this point. Any ideas?

scttnlsn avatar Jul 17 '24 14:07 scttnlsn

I suppose we could look at the UTC offsets to build a list of bools indicating which times are DST. We'd then pass ambigious=dst_list (https://pandas.pydata.org/docs/reference/api/pandas.Series.tz_localize.html).

EDIT: looks like the tzinfo already has info about DST

scttnlsn avatar Jul 17 '24 14:07 scttnlsn

Hmm...I see that I didn't think enough about the tests :) According to the pandas docs infer requires the timestamps to be monotonic, which they aren't after normalize.

cwhanse avatar Jul 17 '24 15:07 cwhanse

This passes tests, but is likely slow due to the looping. I'm now thinking that perhaps ambiguous=NaT would be better.

The issue arises only when a user wants to use DST-aware timestamps in a few places and is not careful when constructing the series of timestamps. It seems better to suggest the user localize using a non-DST timezone such as GMT-4. As far as importing data recorded with DST timestamps, if the timestamps are correct, the ambiguous times won't be in the data.

cwhanse avatar Jul 17 '24 16:07 cwhanse

It seems better to suggest the user localize using a non-DST timezone such as GMT-4

That's a good point. In my own use case I could handle the DST stuff before passing the times into hour_angle.

I'm now thinking that perhaps ambiguous=NaT would be better.

I think that would result in nans being returned from hour_angle which I think could be confusing. I think even letting the AmbiguousTimeError be raised is fine if the documentation indicates that DST should be handled outside pvlib. That way the user is forced to address what's happening instead of maybe being confused about receiving nans.

scttnlsn avatar Jul 17 '24 17:07 scttnlsn

I think even letting the AmbiguousTimeError be raised is fine if the documentation indicates that DST should be handled outside pvlib. That way the user is forced to address what's happening instead of maybe being confused about receiving nans.

+1

cwhanse avatar Jul 17 '24 18:07 cwhanse

Do you think I should just close this PR then? For my own use case, if I'm handling the DST stuff outside pvlib anyway then I don't expect to see a NonExistentTimeError either. If you'd like to keep the nonexistent='shift_forward' I can just remove the ambiguous=is_dst stuff.

scttnlsn avatar Jul 17 '24 18:07 scttnlsn

@scttnlsn can you add yourself to the Contributors list in the whatsnew file? I can do that for you, if needed.

cwhanse avatar Jul 22 '24 16:07 cwhanse

@pvlib/pvanalytics-maintainer if no objections are raised, I will merge this on 9/13

cwhanse avatar Sep 11 '24 15:09 cwhanse

What about _local_times_from_hours_since_midnight and _times_to_hours_after_local_midnight? I think they too are subject to the same bug of tz_localize and normalize and can be fixed together in the same way easily.

https://github.com/pvlib/pvlib-python/blob/cbfa29237d9b303cb4c07caaa486403384aca6a7/pvlib/solarposition.py#L1410-L1419

https://github.com/pvlib/pvlib-python/blob/cbfa29237d9b303cb4c07caaa486403384aca6a7/pvlib/solarposition.py#L1422-L1426

yhkee0404 avatar Sep 12 '24 05:09 yhkee0404