pvlib-python Allow read_tmy3 to handle 00:00 timestamps

~~[ ] Closes #xxxx~~
[x] I am familiar with the contributing guidelines
[x] Tests added
~~[ ] Updates entries in docs/sphinx/source/reference for API changes.~~
[x] Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
[x] New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
[x] Pull request is nearly complete and ready for detailed review.
[x] Maintainer: Appropriate GitHub Labels (including remote-data) and Milestone are assigned to the Pull Request and linked Issue.

As mentioned by @williamhobbs in https://github.com/pvlib/pvlib-python/discussions/1310#discussioncomment-2996349, SolarAnywhere TMY3 files specify midnight as 00:00 instead of 24:00 (which is the NREL TMY3 convention). Currently, the pvlib read_tmy3 function incorrectly treats 00:00 and 24:00 equally by renaming both time stamps to 00:00 and shifting the time stamp one day ahead. This behavior is correct for the 24:00 timestamps but incorrectly shifts 00:00 instances one day ahead.

This PR proposes a minor change to handle both 00:00 and 24:00 timestamps correctly. Additionally, I've made trivial documentation updates to more closely follow the numpydoc style.

Jul 19 '22 05:07 AdamRJensen

@KWagnerCPR could you provide a TMY3 file from SolarAnywhere that can be added to pvlib for testing purposes? Alternatively, with CPR's permission, I can also easily download one myself.

Jul 21 '22 18:07 AdamRJensen

@AdamRJensen I've attached the full historical time series in TMY3 format for one of our Public sites here. SolarAnywhereData_TMY3.zip

Jul 21 '22 20:07 KWagnerCPR

@pvlib/pvlib-maintainer Ready for review :)

Does anyone know why it says "TMYData" 70 times in the documentation? I'd like to remove all those instances..

Also, I think the following line in the documentation might be incorrect: "NOTE, the index is currently timezone unaware, and times are set to local standard time (daylight savings is not included)". The index seems to have the timezone offset included - I'd appreciate a second pair of eyes on that though.

Jul 22 '22 02:07 AdamRJensen

Does anyone know why it says "TMYData" 70 times in the documentation?

A long time ago in a commit far, far away, that's what the function's return value was called:

https://github.com/pvlib/pvlib-python/blob/fef3d518b6ebe719f05bb51beeb7dfa05b45b23c/pvlib/tmy.py#L42-L45

Inherited from the MATLAB code, presumably: https://github.com/sandialabs/MATLAB_PV_LIB/blob/master/pvl_readtmy3.m

Also, I think the following line in the documentation might be incorrect: "NOTE, the index is currently timezone unaware, and times are set to local standard time (daylight savings is not included)". The index seems to have the timezone offset included - I'd appreciate a second pair of eyes on that though.

This pair of eyes agrees with yours; the index includes a fixed UTC offset. The part about not including DST should be kept though.

Jul 22 '22 12:07 kandersolar

I tried to remove the TMYData. at some point, ran into some issue with rst table formatting, and decided it wasn't worth the effort. Maybe things have changed or you're more patient/efficient than I.

Jul 22 '22 16:07 wholmgren

I just tagged this PR with remote-data so that the iotools tests run. I get the same encoding failure locally, FYI.

Huh! I was very surprised to see test failures. I could not replicate them locally, so I imagine it's a windows/linux thing. My best guess is that the ® (registered) symbol in the test file is not parseable on Linux (at least not by default). For now, I have just removed it from the file. I think the best thing would be for CPR to remove it generally from their files and perhaps replace it with (R) or the phrase "Registered, U.S. Patent and Trademark Office". Alternatively, we could look into specifying some specific encoding settings when opening the file 🤷 @kanderso-nrel thoughts?

In hindsight, requiring the remote-data tag to be present before running any iotools tests, not just the actual --remote-data tests, is probably not actually what we want. Might be as simple as just removing the --ignore options in the workflow command.

I'm not quite sure what you're talking about here..

Jul 25 '22 05:07 AdamRJensen

For now, I have just removed it from the file.

I think technically that would be in violation of the CPR license file included in the zip archive posted earlier:

Licensee shall not remove, alter, cover, or disguise any acknowledgements, copyright notice, trademark, or other proprietary rights notice placed by Clean Power Research on the Data or any portion thereof.

I'm not optimistic that CPR would get rid of that character or change their file encoding just for us (maybe I'm wrong!), but I'm not sure what the best alternative is. FWIW specifying encoding='iso-8859-01' (determined using the file command) gets it working for me locally, but I'm wary of hardcoding that in a general purpose function read_tmy3. Maybe expose encoding as an optional parameter and suggest a value for SolarAnywhere files?

I'm not quite sure what you're talking about here..

Sorry, was just thinking out loud that we should probably follow up #1306 with some tweaks. No changes needed here.

Jul 25 '22 12:07 kandersolar

@kanderso-nrel I have added the registrered symbol back in the file. I could always make a custom TMY3 file with 00:00 timestamps for testing. But without the encoding input parameter, SolarAnywhere TMY3 files would still not be parseable on Linux. While not ideal, I think the encoding parameter is a decent option.

@KWagnerCPR It is probably of interest to you to know that the registered ® symbol may cause issues when parsing SolarAnywhere files. Although I've implemented a workaround, it might be an issue for other users / different software in the future. It may be advisable to replace the symbol with a sentence of equivalent legal meaning, e.g., "Registered, U.S. Patent and Trademark Office". Just for your consideration.

Jul 26 '22 01:07 AdamRJensen

read_csv also has an encoding_errors kwarg since pandas 1.3. I've never used it but might help here. We could also try/except with default encoding and then iso-8859-01. Or read the first N bytes of the file to see if it matches the SolarAnywhere format to determine the encoding.

Changing the SolarAnywhere encoding would break existing code, so I wouldn't recommend that unless they bumped their API to 4.0.

Jul 26 '22 15:07 wholmgren

read_csv also has an encoding_errors kwarg since pandas 1.3. I've never used it but might help here. We could also try/except with default encoding and then iso-8859-01. Or read the first N bytes of the file to see if it matches the SolarAnywhere format to determine the encoding.

The read_tmy3 function utilizes the built-in open function, which has an argument called errors. Options for the errors argument are ignore and replace, neither of which I think are great options. I have implemented try/except which I think is the best option (also better than adding an encoding parameter).

Jul 26 '22 23:07 AdamRJensen

pvlib-python pvlib-python copied to clipboard

Allow read_tmy3 to handle 00:00 timestamps

pvlib-python
pvlib-python copied to clipboard