Convert SPA native python loops to numpy operations
Hi,
I'd like to make this proposal to speed up the "nrel_numpy" version of pvlib.solarposition.get_solarposition()
In this PR I just replace some of the native python loops with numpy equivalent slicing and sums. In my local testing (using order 10 input times) I have found that this version take about 45% less time than the current version. It is also takes about 20% less time than the current"nrel_numba" version.
When I modified the sum_mult_cos_add_mult() and longitude_obliquity_nutation() functions could no longer infer the datatypes of these functions. Hence these functions now use the global USE_NUMBA setting. This allows numba to infer the dataypes whilst allowing the numpy version to avoid the native python loops.
- [x] I am familiar with the contributing guidelines
- [ ] Tests added
- [ ] Updates entries in
docs/sphinx/source/referencefor API changes. - [ ] Adds description and name entries in the appropriate "what's new" file in
docs/sphinx/source/whatsnewfor all changes. Includes link to the GitHub Issue with:issue:`num`or this Pull Request with:pull:`num`. Includes contributor name and/or GitHub username (link with:ghuser:`user`). - [ ] New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
- [ ] Pull request is nearly complete and ready for detailed review.
- [ ] Maintainer: Appropriate GitHub Labels (including
remote-data) and Milestone are assigned to the Pull Request and linked Issue.
@dfulu is this ready for the CI?
Hi @cwhanse, yes it is
@cwhanse I think the test failure on windows + python 3.9 is unrelated to the changes I've made. I'm not sure how to proceed
@dfulu the py39 failure is unrelated.
The portions of code activated by USE_NUMBA are shown as untested. That looks like something that needs fixed on pvlib's side. I'm not the right person to determine what that is.
Although marked as successful, the benchmark report indicates that NUMBA couldn't be imported in the py39 environment for the speed checks.
@cwhanse Thanks for the reply, I'll maybe wait until someone else is available to look. In the mean time I'll see if there is another way instead of using the USE_NUMBA flag
Nice @dfulu! Have you checked how the vectorized version affects peak memory usage?
Although marked as successful, the benchmark report indicates that NUMBA couldn't be imported in the py39 environment for the speed checks.
Thanks for noticing this. It's an issue on main too. Maybe some incompatibility between numba and numpy versions getting installed in the benchmark environment?
The portions of code activated by USE_NUMBA are shown as untested. That looks like something that needs fixed on pvlib's side. I'm not the right person to determine what that is.
Code coverage isn't (can't be) recorded for code that gets run via numba. There's no workaround as far as I'm aware, but the last time I checked was a couple years ago now.
Nice @dfulu! Have you checked how the vectorized version affects peak memory usage?
@kandersolar, I didn't think of that actually.
I've had a look, and the vectorised version definitely uses more memory. I did a bit of testing of the solar_position() function with 10k times. With this version, the peak memory usage was about 23MB, the main branch version for 10k inputs used 10MB at its peak. For 100k datetimes these become and 165MB and 41MB respectively. So definitely this new one scales less well on memory footprint. I think both scale as with O(N), but different coefficients.
I've also done a bit more profiling and found that this version only beats the main branch up to the order of 1000 input times, then they converge. It also only beats the numba version at around O(10) input times which is exactly the order I was testing at. So not as exciting as it first seemed
I've also done a bit more profiling and found that this version only beats the main branch up to the order of 1000 input times, then they converge.
I am seeing this PR being rather slower at medium-sized N (10k times), perhaps 1.5-2x the runtime compared with main. Since ~10k times is a primary pvlib use case (one year of hourly values), a slowdown there is a problem. With that, and the added code complexity of the if USE_NUMBA: branches, it's not looking like it makes sense to merge this PR.
However, I was running the comparison on a noisy laptop, so even though the results seemed relatively repeatable, it's possible my brief testing isn't representative. I'd be interested in seeing more results if anyone is interested in trying it out.