pvlib-python icon indicating copy to clipboard operation
pvlib-python copied to clipboard

Add functions to read and retrieve SolarAnywhere irradiance data

Open AdamRJensen opened this issue 2 years ago • 12 comments

  • [x] Closes #1310
  • [x] I am familiar with the contributing guidelines
  • [x] Tests added
  • [x] Updates entries in docs/sphinx/source/reference for API changes.
  • [x] Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
  • [x] New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
  • [x] Pull request is nearly complete and ready for detailed review.
  • [x] Maintainer: Appropriate GitHub Labels (including remote-data) and Milestone are assigned to the Pull Request and linked Issue.

Description This PR adds functions for reading and retrieving SolarAnywhere data as discussed in #1310. SolarAnywhere is a commercial dataset, although it provides irradiance data for a few locations for free.

Minor comments The SolarAnywhere API is asynchronous, hence two requests are made for each data retrieval.

The get_solaranywhere function is currently not designed to handle annual and monthly data, as this seems less relevant to pvlib users.

AdamRJensen avatar Jul 22 '22 03:07 AdamRJensen

@KWagnerCPR It would be great if you could test the functions and give some feedback on the default values. :smile:

AdamRJensen avatar Jul 22 '22 03:07 AdamRJensen

@KWagnerCPR It would be great if you could test the functions and give some feedback on the default values. 😄

@AdamRJensen Yes, I'll have a few people on our team test it out and get back to you. Which default values are you referring to?

KWagnerCPR avatar Jul 22 '22 18:07 KWagnerCPR

@KWagnerCPR In regards to default values I'm thinking of these:

def get_solaranywhere(latitude, longitude, api_key, start=None, end=None,
                      time_resolution=60, spatial_resolution=0.1,
                      true_dynamics=False, source='SolarAnywhereLatest',
                      variables=DEFAULT_VARIABLES, missing_data='Omit',
                      url=URL, map_variables=True, max_response_time=300)

Also, I have chosen not to parse any annual and monthly data, as that seems to be outside the scope of pvlib and would just unnecessarily complicate the function.

Furthermore, the get_solaranywhere function currently does not support retrieval of "probability of exceedance" time series, as the public license does not allow me to retrieve those.

AdamRJensen avatar Jul 22 '22 19:07 AdamRJensen

@AdamRJensen

In terms of default values, I would recommend the following: time_resolution=60, spatial_resolution=0.01, true_dynamics=False, source='SolarAnywhereLatest', missing_data='FillAverage',

With SolarAnywhere Public, users get access to 1 km (0.01 x 0.01 degree) spatial resolution data, so I think it makes sense to set the default as 0.01 if our focus is on time series requests rather than TGY requests. Also, using the "FillAverage" default for missing data will ensure periods of missing satellite imagery are filled with long term averages for those times, which is likely preferable.

You should be able to request PXX data as a part of the SolarAnywhere public license. The issue might have been that you had the spatial resolution set to 0.1 when you requested it. You also have to include "ProbabilityOfExceedance": 90, in the request.

I think it makes sense not to request the monthly and annual totals at this time.

KWagnerCPR avatar Jul 22 '22 20:07 KWagnerCPR

@KWagnerCPR I have updated the default values accordingly. Thanks for the feedback! :smile:

Also, I figured out how to work with PXX data, so I've made some modifications to handle that and added a keyword called probability_of_exceedance.

AdamRJensen avatar Jul 23 '22 08:07 AdamRJensen

@pvlib/pvlib-maintainer I am thinking of developing primarily mock tests for the get_solaranywhere function as the SolarAnywhere data retrieval is slow (minutes). However, I would like to include at least one real request of historical data (just two days of data, which would make the time reasonable). Thus, unless I hear any objections I will go ahead and register the pvlib email for a free SolarAnywhere account and add a GitHub Secret called SOLARANYWHERE_API_KEY. If there is a desire to avoid live tests at all, let me know. FYI I imagine writing mock tests for this will be a real pain.

AdamRJensen avatar Jul 23 '22 20:07 AdamRJensen

@AdamRJensen We've tested these functions and they are working well!

One thing I noticed is that when I request time series data (WeatherDataSource = SolarAnywhereLatest) without specifying a start or end time, I don't get the error message you specify in the script: "'When requesting non-TMY data, specifying start and 'end` is required.'". Instead I get: AttributeError: 'NoneType' object has no attribute 'tz'. Not a huge deal. Just wanted to point it out.

I think these will be really helpful for our customers as well as new SolarAnywhere users. I'd love to include links to these in our developers documentation. Do you know if these will get documented in the official pvlib documentation?: https://pvlib-python.readthedocs.io/en/stable/index.html

KWagnerCPR avatar Jul 26 '22 21:07 KWagnerCPR

One thing I noticed is that when I request time series data (WeatherDataSource = SolarAnywhereLatest) without specifying a start or end time, I don't get the error message you specify in the script: "'When requesting non-TMY data, specifying start and 'end` is required.'". Instead I get: AttributeError: 'NoneType' object has no attribute 'tz'. Not a huge deal. Just wanted to point it out.

@KWagnerCPR Thank you for catching this error. I had forgotten to raise the error, hence nothing was happening - it was been corrected.

I think these will be really helpful for our customers as well as new SolarAnywhere users. I'd love to include links to these in our developers documentation. Do you know if these will get documented in the official pvlib documentation?: https://pvlib-python.readthedocs.io/en/stable/index.html

Once the functions get merged and we make a new release of pvlib, the documentation will be available publicly. We're releasing version 0.9.2 within a few weeks, but I can't promise that this pull request will be merged by then.

AdamRJensen avatar Jul 26 '22 23:07 AdamRJensen

One thing I noticed is that when I request time series data (WeatherDataSource = SolarAnywhereLatest) without specifying a start or end time, I don't get the error message you specify in the script: "'When requesting non-TMY data, specifying start and 'end` is required.'". Instead I get: AttributeError: 'NoneType' object has no attribute 'tz'. Not a huge deal. Just wanted to point it out.

@KWagnerCPR Thank you for catching this error. I had forgotten to raise the error, hence nothing was happening - it was been corrected.

I think these will be really helpful for our customers as well as new SolarAnywhere users. I'd love to include links to these in our developers documentation. Do you know if these will get documented in the official pvlib documentation?: https://pvlib-python.readthedocs.io/en/stable/index.html

Once the functions get merged and we make a new release of pvlib, the documentation will be available publicly. We're releasing version 0.9.2 within a few weeks, but I can't promise that this pull request will be merged by then.

@AdamRJensen Thank you! I'll stay tuned.

KWagnerCPR avatar Jul 27 '22 16:07 KWagnerCPR

When downloading data from the SolarAnywhere website, missing data may either be filled with average data or represented as missing using one of three options: image

Missing values are by default correctly converted to Numpy nan. Similarly, I belive that the 'NaN' option would also be correctly converted to Numpy nan since we're using pd.read_csv (though I haven't tested it). However, -999 will have to be converted to numpy nan by the read_solaranywhere function.

@KWagnerCPR Do you have a file or a know station/time period where there is missing data that I could use for testing?

AdamRJensen avatar Aug 16 '22 10:08 AdamRJensen

The remote test failures are caused by this line:

solaranywhere_api_key = os.environ["SOLARANYWHERE_API_KEY"]

which makes me assume that the API key added to the GitHub Secrets by @kanderso-nrel isn't named what I think they are...

AdamRJensen avatar Aug 16 '22 21:08 AdamRJensen

I think it is just the annoying but secure behavior of pull-request-target not respecting yml changes in the PR -- your update to include the SA key in the environment isn't taking effect because the action is defined by the .github/workflows/pytest-remote-data.yml file on master, not this PR branch. I guess the cleanest workaround is to submit a separate PR to change pytest-remote-data.yml, merge that, then merge master into this PR?

kandersolar avatar Aug 16 '22 22:08 kandersolar

@AdamRJensen I was taking a look at this section of the pvlib documentation: [https://pvlib-python.readthedocs.io/en/stable/reference/iotools.html] and I didn't see the documentation for these functions to retrieve and read SolarAnywhere data. Has the official documentation been posted for these?

KWagnerCPR avatar Nov 27 '23 20:11 KWagnerCPR

@KWagnerCPR they won't appear in the stable docs until the PR is merged. You can preview the docs here: https://pvlib-python--1497.org.readthedocs.build/en/1497/

cwhanse avatar Nov 27 '23 20:11 cwhanse

@cwhanse I see, thank you for sharing that link! Looks like this item was slated for the 0.10.2 release on September 6, but was then deferred. Do you have an update on when the next release may take place?

KWagnerCPR avatar Nov 27 '23 22:11 KWagnerCPR

@KWagnerCPR I've added this to the v0.10.3 milestone, which is scheduled to be released mid-December. But (as seen in the previous milestones for this PR), that's just an aspiration, not a guarantee. Getting this PR merged depends on reviewer availability, which is in rather short supply at the moment.

kandersolar avatar Nov 28 '23 17:11 kandersolar

@kandersolar Understood. Thank you for this background info!

KWagnerCPR avatar Nov 28 '23 17:11 KWagnerCPR

@kandersolar Could you help with the flake8 issue - I can't really tell what it is?

AdamRJensen avatar Dec 19 '23 12:12 AdamRJensen

@kandersolar Could you help with the flake8 issue - I can't really tell what it is?

Do you get the error if the csv files are read with pandas?

cwhanse avatar Dec 19 '23 14:12 cwhanse

@kandersolar Could you help with the flake8 issue - I can't really tell what it is?

I think the problem is this:

  1. To use flake8's --diff option, we are piping git diff into flake8 via stdin rather than letting flake8 discover and read files itself
  2. That means flake8 has to identify .py files by parsing the git diff text provided on stdin, which means it wants to decode the stdin text to UTF-8 first
  3. The CSV data file has that troublesome trademark character that cannot be decoded as UTF-8. This character is included in the output of git diff
  4. Thus flake8 breaks while trying to decode the data passed to its stdin

What to do about it? Modify the git diff to only show output for .py files like this, maybe?

git diff upstream/main -- "*.py" | flake8 --exclude pvlib/version.py --ignore E201,E241,E226,W503,W504 --max-line-length 79 --diff

kandersolar avatar Dec 19 '23 14:12 kandersolar

I think the problem is this:

  1. To use flake8's --diff option, we are piping git diff into flake8 via stdin rather than letting flake8 discover and read files itself
  2. That means flake8 has to identify .py files by parsing the git diff text provided on stdin, which means it wants to decode the stdin text to UTF-8 first
  3. The CSV data file has that troublesome trademark character that cannot be decoded as UTF-8. This character is included in the output of git diff
  4. Thus flake8 breaks while trying to decode the data passed to its stdin

What to do about it? Modify the git diff to only show output for .py files like this, maybe?

git diff upstream/main -- "*.py" | flake8 --exclude pvlib/version.py --ignore E201,E241,E226,W503,W504 --max-line-length 79 --diff

This solved the issue. Thanks!

AdamRJensen avatar Dec 19 '23 15:12 AdamRJensen

Ok, going ahead with the merge. Thanks @AdamRJensen!

kandersolar avatar Dec 20 '23 17:12 kandersolar

We greatly appreciate the work on this!

KWagnerCPR avatar Dec 20 '23 23:12 KWagnerCPR