BPCells::linear_operator()
Hi Ben,
Is it OK to use the BPCells::linear_operator() function in production software? Or do you recommend against it?
If it is OK to use it, does it help with matrix multiplication?
I appreciate your consideration and guidance.
Thank you!
Ever grateful, Brent
Hi Brent, what linear_operator() does is reduce the R-level overhead for matrix or vector multiplies on BPCells objects. Normally, BPCells creates+discards the C++ object for every multiply op, which can cause increased memory usage due to garbage collection and potentially some slowness if many fast multiplies are being run. linear_operator() creates and saves a C++-level object wrapped in an R object for repeated use in matrix or vector multiplies to avoid potential overhead.
The main risk of keeping C++-level objects around in R is that coding mistakes are a lot more likely to escalate to crashing the whole R session with a segfault. And open matrix data files will only close once the R garbage collector decides to run, sometimes causing HDF5 locking errors.
I had originally come up with this idea for use calculating SVDs, but since then made a dedicated svds() function that handles everything automatically (and adds support for multi-threading). That eliminates the biggest use-case for vector multiplies I think. If you're doing a randomized SVD and need just a few matrix multiplies, I'd suspect the overhead is not high enough to get much benefit out of linear_operator().
If you use linear_operator() fully confined within a single library function I think it should be pretty safe, and you could add in a call to gc() after deleting the object for good measure. I don't know of an active use-case right now that requires linear_operator() to get good performance, but if you have one I'd be happy to stabilize the function.
What use-case did you have in mind?
Hi Ben,
I feel a bit embarrassed. I am using linear_operator() with a call to the irlba() function, as you surmised. I am aware of your svd() function but I am reluctant to replace irlba() with svd() because of small differences in the results, which can propagate into noticeable differences in UMAP plots, for example. We have users who have prepared large reference data sets using the irlba() function and I don't know how their analyses would be affected by switching svd functions. I suppose a solution is to have an option to choose between the two functions.
Also, I wondered if linear_operator() might give shorter run times for large matrix-matrix multiplications. These would be BPCells x dense matrix multiplications, usually.
I wondered too if it's safe to use linear_operator().
I appreciate greatly your consideration and guidance.
Thank you!
Ever grateful, Brent
Hi Brent, totally understand if you have some use-cases where you want to keep with irlba. That is the original use-case linear_operator() was developed for, and it should be safe to use for that purpose. Thinking more, it's possible speed wasn't even much of a concern and keeping memory usage low might have been the main concern (though memory usage is still quite low even without using linear_operator())
From my own personal testing, I think the variance between running irlba and the spectra-based solver in BPCells is about 0.1% relative difference with default precision settings once you account for sign flipping of PCs. It's reasonable if you have a preference for irlba over the BPCells svds function, and of course identical results will require at least using the same solver and random seed.
For large BPCells x dense matrix multiplications, I doubt that linear_operator() will make much impact, since it just removes a small amount of overhead from each multiply. The longer it takes to do a multiply, the less the overhead will make an impact.
linear_operator() should be safe to use, just I'd recommend minimizing the lifespan of the wrapped C++ objects returned from linear_operator(), since errors they encounter are much more likely to crash the R session (e.g. if a user deletes matrix data from disk but then tries to use a BPCells matrix object)
Hi Ben,
Thank you for the precise explanation. I appreciate greatly your advice!
Thank you!
Ever grateful, Brent