Improve runtime and peak memory usage of PVEngine.run_full_mode
This PR does two things:
- As discussed in #134, the majority of the runtime of
PVEngine.run_full_modeis in the matrix inversion. However, especially for large simulations, using a sparse linear solve function turns out to be significantly faster than explicitly inverting the full matrix. This replaces the linear algebra operations fromnumpy.linalgwith sparse equivalents fromscipy.sparse.linalg. -
PVEngine.run_full_modecreates many large matrices, most of which are only needed in certain parts of the function. This results in significant (GBs) memory usage for large simulations. Especially for large simulations, includingdelstatements when these large matrices are no longer needed reduces maximum memory usage significantly.
Here are some runtime and peak memory comparisons of a typical call to the pvlib wrapper function (pvlib.bifacial.pvfactors_timeseries in v0.9.0). All values are the ratio of this PR to v1.5.2, so e.g. a value of 0.67 means this PR runs in 2/3 the time, or uses 2/3 the memory, relative to v1.5.2.
Runtime ratio:
n_timestamps 100 1000 5000 10000
n_pvrows
3 0.91 0.77 0.81 0.80
5 0.61 0.41 0.31 0.39
7 0.60 0.40 0.30 0.34
Peak RAM usage ratio:
n_timestamps 100 1000 5000 10000
n_pvrows
3 0.99 0.84 0.67 0.63
5 0.97 0.67 0.60 0.59
7 0.75 0.61 0.58 0.58
The above ratios, especially the peak memory values, are relatively stable on my machine. Timing is rather less stable; reducing external CPU load as much as possible and setting the power mode to "best battery" (in an attempt to disable CPU turbo boost and such) seems to help. I would be very interested to see if others can recreate the above results, especially on the timing side -- @spaneja kindly ran some for me earlier and the results were not wholly consistent with what I've shown above.
Miscellaneous to-do:
- [x] take a closer look at where memory is allocated and where it is used; the current
delsituation is probably still suboptimal - [x] what's new
- [x] tests?
- [x] make scipy an explicit dependency (right now it's implicit via pvlib)
- [x] document
_sparse_solve - [ ] ...and maybe put it somewhere that makes more sense?
- [ ] consider making the sparse approach opt-in via a boolean kwarg or similar?
I suppose this is ready for review! Open questions:
- The private sparse solve function seems a bit out of place in
engine.py; should it live somewhere else? - Should the new sparse approach be made opt-in somehow?
- An alternative to using
delis to break uprun_full_modeinto small functions where the intermediate arrays created in each sub function go out of scope when that function finishes. I think sub functions is probably the more pythonic approach, but it would be a bigger change than just throwing in a fewdels. Any preference? - Most importantly: can other people replicate the runtime and memory improvements I'm seeing locally?
You probably already know this, but the complexity/organization/shape of the matrix also affects computation speed. Diagonal, banded, & symmetrical matrices are easier to solve. If you can permute the matrix so that the non-zero elements are close to the diagonal axis you can further improve the computation simplicity. Although I believe most sparse solvers will try to permute the matrix automatically. See: https://en.m.wikipedia.org/wiki/Band_matrix
If you can permute the matrix so that the non-zero elements are close to the diagonal axis you can further improve the computation simplicity. Although I believe most sparse solvers will try to permute the matrix automatically.
scipy.sparse.linalg.spsolve seems to include automatic column permuting but I didn't think I was qualified to try to do better than the default :) For reference, here's what a typical slice of a_mat looks like (color means nonzero):

Passing empty timeseries inputs currently works on master but didn't work here; 480b350 fixes that and adds a relevant test. Not sure how valuable it is to support that use case, but might as well if it's easy...
Ping @campanelli-sunpower -- these changes would be nice to have in the main branch for a project @spaneja and I are doing, any chance this could get merged soonish?
Putting this back to draft -- after further experimentation I'm no longer confident that this PR is the best approach. I will continue the discussion back in #134.