pvlib-python
pvlib-python copied to clipboard
Allow read_tmy3 to handle 00:00 timestamps
- ~~[ ] Closes #xxxx~~
- [x] I am familiar with the contributing guidelines
- [x] Tests added
- ~~[ ] Updates entries in
docs/sphinx/source/reference
for API changes.~~ - [x] Adds description and name entries in the appropriate "what's new" file in
docs/sphinx/source/whatsnew
for all changes. Includes link to the GitHub Issue with:issue:`num`
or this Pull Request with:pull:`num`
. Includes contributor name and/or GitHub username (link with:ghuser:`user`
). - [x] New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
- [x] Pull request is nearly complete and ready for detailed review.
- [x] Maintainer: Appropriate GitHub Labels (including
remote-data
) and Milestone are assigned to the Pull Request and linked Issue.
As mentioned by @williamhobbs in https://github.com/pvlib/pvlib-python/discussions/1310#discussioncomment-2996349, SolarAnywhere TMY3 files specify midnight as 00:00 instead of 24:00 (which is the NREL TMY3 convention). Currently, the pvlib read_tmy3
function incorrectly treats 00:00 and 24:00 equally by renaming both time stamps to 00:00 and shifting the time stamp one day ahead. This behavior is correct for the 24:00 timestamps but incorrectly shifts 00:00 instances one day ahead.
This PR proposes a minor change to handle both 00:00 and 24:00 timestamps correctly. Additionally, I've made trivial documentation updates to more closely follow the numpydoc style.
@KWagnerCPR could you provide a TMY3 file from SolarAnywhere that can be added to pvlib for testing purposes? Alternatively, with CPR's permission, I can also easily download one myself.
@AdamRJensen I've attached the full historical time series in TMY3 format for one of our Public sites here. SolarAnywhereData_TMY3.zip
@pvlib/pvlib-maintainer Ready for review :)
Does anyone know why it says "TMYData" 70 times in the documentation? I'd like to remove all those instances..
Also, I think the following line in the documentation might be incorrect: "NOTE, the index is currently timezone unaware, and times are set to local standard time (daylight savings is not included)". The index seems to have the timezone offset included - I'd appreciate a second pair of eyes on that though.
Does anyone know why it says "TMYData" 70 times in the documentation?
A long time ago in a commit far, far away, that's what the function's return value was called:
https://github.com/pvlib/pvlib-python/blob/fef3d518b6ebe719f05bb51beeb7dfa05b45b23c/pvlib/tmy.py#L42-L45
Inherited from the MATLAB code, presumably: https://github.com/sandialabs/MATLAB_PV_LIB/blob/master/pvl_readtmy3.m
Also, I think the following line in the documentation might be incorrect: "NOTE, the index is currently timezone unaware, and times are set to local standard time (daylight savings is not included)". The index seems to have the timezone offset included - I'd appreciate a second pair of eyes on that though.
This pair of eyes agrees with yours; the index includes a fixed UTC offset. The part about not including DST should be kept though.
I tried to remove the TMYData.
at some point, ran into some issue with rst table formatting, and decided it wasn't worth the effort. Maybe things have changed or you're more patient/efficient than I.
I just tagged this PR with
remote-data
so that the iotools tests run. I get the same encoding failure locally, FYI.
Huh! I was very surprised to see test failures. I could not replicate them locally, so I imagine it's a windows/linux thing. My best guess is that the ® (registered) symbol in the test file is not parseable on Linux (at least not by default). For now, I have just removed it from the file. I think the best thing would be for CPR to remove it generally from their files and perhaps replace it with (R) or the phrase "Registered, U.S. Patent and Trademark Office". Alternatively, we could look into specifying some specific encoding settings when opening the file 🤷 @kanderso-nrel thoughts?
In hindsight, requiring the
remote-data
tag to be present before running any iotools tests, not just the actual--remote-data
tests, is probably not actually what we want. Might be as simple as just removing the--ignore
options in the workflow command.
I'm not quite sure what you're talking about here..
For now, I have just removed it from the file.
I think technically that would be in violation of the CPR license file included in the zip archive posted earlier:
Licensee shall not remove, alter, cover, or disguise any acknowledgements, copyright notice, trademark, or other proprietary rights notice placed by Clean Power Research on the Data or any portion thereof.
I'm not optimistic that CPR would get rid of that character or change their file encoding just for us (maybe I'm wrong!), but I'm not sure what the best alternative is. FWIW specifying encoding='iso-8859-01'
(determined using the file command) gets it working for me locally, but I'm wary of hardcoding that in a general purpose function read_tmy3
. Maybe expose encoding
as an optional parameter and suggest a value for SolarAnywhere files?
I'm not quite sure what you're talking about here..
Sorry, was just thinking out loud that we should probably follow up #1306 with some tweaks. No changes needed here.
@kanderso-nrel I have added the registrered symbol back in the file. I could always make a custom TMY3 file with 00:00 timestamps for testing. But without the encoding
input parameter, SolarAnywhere TMY3 files would still not be parseable on Linux. While not ideal, I think the encoding
parameter is a decent option.
@KWagnerCPR It is probably of interest to you to know that the registered ® symbol may cause issues when parsing SolarAnywhere files. Although I've implemented a workaround, it might be an issue for other users / different software in the future. It may be advisable to replace the symbol with a sentence of equivalent legal meaning, e.g., "Registered, U.S. Patent and Trademark Office". Just for your consideration.
read_csv
also has an encoding_errors
kwarg since pandas 1.3. I've never used it but might help here. We could also try/except with default encoding and then iso-8859-01
. Or read the first N bytes of the file to see if it matches the SolarAnywhere format to determine the encoding.
Changing the SolarAnywhere encoding would break existing code, so I wouldn't recommend that unless they bumped their API to 4.0.
read_csv
also has anencoding_errors
kwarg since pandas 1.3. I've never used it but might help here. We could also try/except with default encoding and theniso-8859-01
. Or read the first N bytes of the file to see if it matches the SolarAnywhere format to determine the encoding.
The read_tmy3
function utilizes the built-in open
function, which has an argument called errors
. Options for the errors argument are ignore
and replace
, neither of which I think are great options. I have implemented try/except which I think is the best option (also better than adding an encoding
parameter).