pvfactors Improve runtime and peak memory usage of PVEngine.run_full

This PR does two things:

As discussed in #134, the majority of the runtime of PVEngine.run_full_mode is in the matrix inversion. However, especially for large simulations, using a sparse linear solve function turns out to be significantly faster than explicitly inverting the full matrix. This replaces the linear algebra operations from numpy.linalg with sparse equivalents from scipy.sparse.linalg.
PVEngine.run_full_mode creates many large matrices, most of which are only needed in certain parts of the function. This results in significant (GBs) memory usage for large simulations. Especially for large simulations, including del statements when these large matrices are no longer needed reduces maximum memory usage significantly.

Here are some runtime and peak memory comparisons of a typical call to the pvlib wrapper function (pvlib.bifacial.pvfactors_timeseries in v0.9.0). All values are the ratio of this PR to v1.5.2, so e.g. a value of 0.67 means this PR runs in 2/3 the time, or uses 2/3 the memory, relative to v1.5.2.

Runtime ratio:

n_timestamps  100    1000   5000   10000
n_pvrows                                
3              0.91   0.77   0.81   0.80
5              0.61   0.41   0.31   0.39
7              0.60   0.40   0.30   0.34

Peak RAM usage ratio:

n_timestamps  100    1000   5000   10000
n_pvrows                                
3              0.99   0.84   0.67   0.63
5              0.97   0.67   0.60   0.59
7              0.75   0.61   0.58   0.58

The above ratios, especially the peak memory values, are relatively stable on my machine. Timing is rather less stable; reducing external CPU load as much as possible and setting the power mode to "best battery" (in an attempt to disable CPU turbo boost and such) seems to help. I would be very interested to see if others can recreate the above results, especially on the timing side -- @spaneja kindly ran some for me earlier and the results were not wholly consistent with what I've shown above.

Miscellaneous to-do:

[x] take a closer look at where memory is allocated and where it is used; the current del situation is probably still suboptimal
[x] what's new
[x] tests?
[x] make scipy an explicit dependency (right now it's implicit via pvlib)
[x] document _sparse_solve
[ ] ...and maybe put it somewhere that makes more sense?
[ ] consider making the sparse approach opt-in via a boolean kwarg or similar?

Feb 25 '22 21:02 kandersolar

I suppose this is ready for review! Open questions:

The private sparse solve function seems a bit out of place in engine.py; should it live somewhere else?
Should the new sparse approach be made opt-in somehow?
An alternative to using del is to break up run_full_mode into small functions where the intermediate arrays created in each sub function go out of scope when that function finishes. I think sub functions is probably the more pythonic approach, but it would be a bigger change than just throwing in a few dels. Any preference?
Most importantly: can other people replicate the runtime and memory improvements I'm seeing locally?

Feb 26 '22 14:02 kandersolar

You probably already know this, but the complexity/organization/shape of the matrix also affects computation speed. Diagonal, banded, & symmetrical matrices are easier to solve. If you can permute the matrix so that the non-zero elements are close to the diagonal axis you can further improve the computation simplicity. Although I believe most sparse solvers will try to permute the matrix automatically. See: https://en.m.wikipedia.org/wiki/Band_matrix

Feb 26 '22 20:02 mikofski

If you can permute the matrix so that the non-zero elements are close to the diagonal axis you can further improve the computation simplicity. Although I believe most sparse solvers will try to permute the matrix automatically.

scipy.sparse.linalg.spsolve seems to include automatic column permuting but I didn't think I was qualified to try to do better than the default :) For reference, here's what a typical slice of a_mat looks like (color means nonzero):

Feb 27 '22 15:02 kandersolar

Passing empty timeseries inputs currently works on master but didn't work here; 480b350 fixes that and adds a relevant test. Not sure how valuable it is to support that use case, but might as well if it's easy...

Mar 03 '22 20:03 kandersolar

Ping @campanelli-sunpower -- these changes would be nice to have in the main branch for a project @spaneja and I are doing, any chance this could get merged soonish?

Apr 06 '22 17:04 kandersolar

Putting this back to draft -- after further experimentation I'm no longer confident that this PR is the best approach. I will continue the discussion back in #134.

May 19 '22 16:05 kandersolar

Improve runtime and peak memory usage of PVEngine.run_full_mode