pvlib-python
pvlib-python copied to clipboard
Add functions to read and retrieve SolarAnywhere irradiance data
- [x] Closes #1310
- [x] I am familiar with the contributing guidelines
- [x] Tests added
- [x] Updates entries in
docs/sphinx/source/reference
for API changes. - [x] Adds description and name entries in the appropriate "what's new" file in
docs/sphinx/source/whatsnew
for all changes. Includes link to the GitHub Issue with:issue:`num`
or this Pull Request with:pull:`num`
. Includes contributor name and/or GitHub username (link with:ghuser:`user`
). - [x] New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
- [x] Pull request is nearly complete and ready for detailed review.
- [x] Maintainer: Appropriate GitHub Labels (including
remote-data
) and Milestone are assigned to the Pull Request and linked Issue.
Description This PR adds functions for reading and retrieving SolarAnywhere data as discussed in #1310. SolarAnywhere is a commercial dataset, although it provides irradiance data for a few locations for free.
Minor comments The SolarAnywhere API is asynchronous, hence two requests are made for each data retrieval.
The get_solaranywhere
function is currently not designed to handle annual and monthly data, as this seems less relevant to pvlib users.
@KWagnerCPR It would be great if you could test the functions and give some feedback on the default values. :smile:
@KWagnerCPR It would be great if you could test the functions and give some feedback on the default values. 😄
@AdamRJensen Yes, I'll have a few people on our team test it out and get back to you. Which default values are you referring to?
@KWagnerCPR In regards to default values I'm thinking of these:
def get_solaranywhere(latitude, longitude, api_key, start=None, end=None,
time_resolution=60, spatial_resolution=0.1,
true_dynamics=False, source='SolarAnywhereLatest',
variables=DEFAULT_VARIABLES, missing_data='Omit',
url=URL, map_variables=True, max_response_time=300)
Also, I have chosen not to parse any annual and monthly data, as that seems to be outside the scope of pvlib and would just unnecessarily complicate the function.
Furthermore, the get_solaranywhere
function currently does not support retrieval of "probability of exceedance" time series, as the public license does not allow me to retrieve those.
@AdamRJensen
In terms of default values, I would recommend the following: time_resolution=60, spatial_resolution=0.01, true_dynamics=False, source='SolarAnywhereLatest', missing_data='FillAverage',
With SolarAnywhere Public, users get access to 1 km (0.01 x 0.01 degree) spatial resolution data, so I think it makes sense to set the default as 0.01 if our focus is on time series requests rather than TGY requests. Also, using the "FillAverage" default for missing data will ensure periods of missing satellite imagery are filled with long term averages for those times, which is likely preferable.
You should be able to request PXX data as a part of the SolarAnywhere public license. The issue might have been that you had the spatial resolution set to 0.1 when you requested it. You also have to include "ProbabilityOfExceedance": 90, in the request.
I think it makes sense not to request the monthly and annual totals at this time.
@KWagnerCPR I have updated the default values accordingly. Thanks for the feedback! :smile:
Also, I figured out how to work with PXX data, so I've made some modifications to handle that and added a keyword called probability_of_exceedance
.
@pvlib/pvlib-maintainer I am thinking of developing primarily mock tests for the get_solaranywhere
function as the SolarAnywhere data retrieval is slow (minutes). However, I would like to include at least one real request of historical data (just two days of data, which would make the time reasonable). Thus, unless I hear any objections I will go ahead and register the pvlib email for a free SolarAnywhere account and add a GitHub Secret called SOLARANYWHERE_API_KEY
. If there is a desire to avoid live tests at all, let me know. FYI I imagine writing mock tests for this will be a real pain.
@AdamRJensen We've tested these functions and they are working well!
One thing I noticed is that when I request time series data (WeatherDataSource = SolarAnywhereLatest) without specifying a start or end time, I don't get the error message you specify in the script: "'When requesting non-TMY data, specifying start
and 'end` is required.'". Instead I get: AttributeError: 'NoneType' object has no attribute 'tz'. Not a huge deal. Just wanted to point it out.
I think these will be really helpful for our customers as well as new SolarAnywhere users. I'd love to include links to these in our developers documentation. Do you know if these will get documented in the official pvlib documentation?: https://pvlib-python.readthedocs.io/en/stable/index.html
One thing I noticed is that when I request time series data (WeatherDataSource = SolarAnywhereLatest) without specifying a start or end time, I don't get the error message you specify in the script: "'When requesting non-TMY data, specifying
start
and 'end` is required.'". Instead I get: AttributeError: 'NoneType' object has no attribute 'tz'. Not a huge deal. Just wanted to point it out.
@KWagnerCPR Thank you for catching this error. I had forgotten to raise the error, hence nothing was happening - it was been corrected.
I think these will be really helpful for our customers as well as new SolarAnywhere users. I'd love to include links to these in our developers documentation. Do you know if these will get documented in the official pvlib documentation?: https://pvlib-python.readthedocs.io/en/stable/index.html
Once the functions get merged and we make a new release of pvlib, the documentation will be available publicly. We're releasing version 0.9.2 within a few weeks, but I can't promise that this pull request will be merged by then.
One thing I noticed is that when I request time series data (WeatherDataSource = SolarAnywhereLatest) without specifying a start or end time, I don't get the error message you specify in the script: "'When requesting non-TMY data, specifying
start
and 'end` is required.'". Instead I get: AttributeError: 'NoneType' object has no attribute 'tz'. Not a huge deal. Just wanted to point it out.@KWagnerCPR Thank you for catching this error. I had forgotten to raise the error, hence nothing was happening - it was been corrected.
I think these will be really helpful for our customers as well as new SolarAnywhere users. I'd love to include links to these in our developers documentation. Do you know if these will get documented in the official pvlib documentation?: https://pvlib-python.readthedocs.io/en/stable/index.html
Once the functions get merged and we make a new release of pvlib, the documentation will be available publicly. We're releasing version 0.9.2 within a few weeks, but I can't promise that this pull request will be merged by then.
@AdamRJensen Thank you! I'll stay tuned.
When downloading data from the SolarAnywhere website, missing data may either be filled with average data or represented as missing using one of three options:
Missing values are by default correctly converted to Numpy nan. Similarly, I belive that the 'NaN' option would also be correctly converted to Numpy nan since we're using pd.read_csv (though I haven't tested it). However, -999 will have to be converted to numpy nan by the read_solaranywhere
function.
@KWagnerCPR Do you have a file or a know station/time period where there is missing data that I could use for testing?
The remote test failures are caused by this line:
solaranywhere_api_key = os.environ["SOLARANYWHERE_API_KEY"]
which makes me assume that the API key added to the GitHub Secrets by @kanderso-nrel isn't named what I think they are...
I think it is just the annoying but secure behavior of pull-request-target
not respecting yml
changes in the PR -- your update to include the SA key in the environment isn't taking effect because the action is defined by the .github/workflows/pytest-remote-data.yml
file on master
, not this PR branch. I guess the cleanest workaround is to submit a separate PR to change pytest-remote-data.yml
, merge that, then merge master into this PR?
@AdamRJensen I was taking a look at this section of the pvlib documentation: [https://pvlib-python.readthedocs.io/en/stable/reference/iotools.html] and I didn't see the documentation for these functions to retrieve and read SolarAnywhere data. Has the official documentation been posted for these?
@KWagnerCPR they won't appear in the stable docs until the PR is merged. You can preview the docs here: https://pvlib-python--1497.org.readthedocs.build/en/1497/
@cwhanse I see, thank you for sharing that link! Looks like this item was slated for the 0.10.2 release on September 6, but was then deferred. Do you have an update on when the next release may take place?
@KWagnerCPR I've added this to the v0.10.3 milestone, which is scheduled to be released mid-December. But (as seen in the previous milestones for this PR), that's just an aspiration, not a guarantee. Getting this PR merged depends on reviewer availability, which is in rather short supply at the moment.
@kandersolar Understood. Thank you for this background info!
@kandersolar Could you help with the flake8 issue - I can't really tell what it is?
@kandersolar Could you help with the flake8 issue - I can't really tell what it is?
Do you get the error if the csv files are read with pandas?
@kandersolar Could you help with the flake8 issue - I can't really tell what it is?
I think the problem is this:
- To use flake8's
--diff
option, we are pipinggit diff
into flake8 via stdin rather than letting flake8 discover and read files itself - That means flake8 has to identify .py files by parsing the
git diff
text provided on stdin, which means it wants to decode the stdin text to UTF-8 first - The CSV data file has that troublesome trademark character that cannot be decoded as UTF-8. This character is included in the output of
git diff
- Thus flake8 breaks while trying to decode the data passed to its stdin
What to do about it? Modify the git diff
to only show output for .py
files like this, maybe?
git diff upstream/main -- "*.py" | flake8 --exclude pvlib/version.py --ignore E201,E241,E226,W503,W504 --max-line-length 79 --diff
I think the problem is this:
- To use flake8's
--diff
option, we are pipinggit diff
into flake8 via stdin rather than letting flake8 discover and read files itself- That means flake8 has to identify .py files by parsing the
git diff
text provided on stdin, which means it wants to decode the stdin text to UTF-8 first- The CSV data file has that troublesome trademark character that cannot be decoded as UTF-8. This character is included in the output of
git diff
- Thus flake8 breaks while trying to decode the data passed to its stdin
What to do about it? Modify the
git diff
to only show output for.py
files like this, maybe?git diff upstream/main -- "*.py" | flake8 --exclude pvlib/version.py --ignore E201,E241,E226,W503,W504 --max-line-length 79 --diff
This solved the issue. Thanks!
Ok, going ahead with the merge. Thanks @AdamRJensen!
We greatly appreciate the work on this!