loo
loo copied to clipboard
Speed up loo calcs
Write offending (slow) underlying functions in C++
Which are the offending routines?
I'm not sure which parts of the code would benefit most from a C++ implementation. I haven't had time to look into it. Probably need to do some profiling.
On Wednesday, September 14, 2016, Avraham Adler [email protected] wrote:
Which are the offending routines?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/loo/issues/33#issuecomment-247034388, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q4nrqG5n7ekRpErD0oJimXAgoKHxks5qqAbXgaJpZM4Jnv4y .
I had a little free time today, so I forked loo, created an Rcpp branch, and tried my hand at something simple: moving gpdfit (and thus lx) to C++ using Rcpp. By my rough benchmarking, it's about 60% faster. Of course, it means shlepping the entire Rcpp interface into loo, but it may be worth it. If you're interested, it's here: https://github.com/aadler/loo/tree/Rcpp
If I have time I'll see what else I can do.
If you really want to get wild with speed, you can start drapping pragma omp parallels in various places :)
@aadler Thanks a ton for doing this. 60% is a pretty big improvement! I haven't really had a chance to give it look yet but just realized I hadn't responded at all. Hopefully I'll be able to spend some more time on this soon.
My pleasure. I didn't create a pull request since I personally do not use roxygen in my packages and was afraid to mess things up. In my experience, I've found that the [x]apply functions are ones well suited for conversion to compiled code.
Now, if you don't want to use Rcpp, we can go the C/Fortran95 route, but I'd recommend for loo to use Rcpp since the entire Stan ecosystem is in Rcpp.
Also, to be honest, I spent very little time trying to optimize the C++; rather, I followed the R code pretty much up a tree and off a cliff trying to ensure compatibility. Once you have the C++ code in the loo package, I'm sure y'all can streamline it much better than I could have :)
Related, but less important. I had some trouble with the WAIC calculations crashing (something in the colLogSumExps code on large datasets). I rewrote this part using C++. It is about 10% faster, and so I thought I'd share it. It is here and I added a pull requests to @aadler's version as well: https://github.com/reuning/loo
I'm relatively new to this all so caveats apply. Hope this helps in some small way.
Sorry it’s taken so long to come back to this. We’ve been focused on other improvements rather than speed recently, but now that loo 2.0 is out I think we’re ready to more seriously consider using some C++ in the backend iff the improvements are substantial enough to warrant maintaining the additional code and including an Rcpp dependency. Anyone interested in resuming this work?
Sent with GitHawk
Crazy busy at work, but I can probably try in a week or three. Have you looked into the code already posted?
Avi
On Sun, Apr 22, 2018 at 12:11 AM Jonah Gabry [email protected] wrote:
Sorry it’s taken so long to come back to this. We’ve been focused on other improvements rather than speed recently, but now that loo 2.0 is out I think we’re ready to more seriously consider using some C++ in the backend iff the improvements are substantial enough to warrant maintaining the additional code and including an Rcpp dependency. Anyone interested in resuming this work?
Sent with GitHawk http://githawk.com
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/loo/issues/33#issuecomment-383354019, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVk8VloRQNJTY10FXuWrPoriwMFsBtvks5trAMGgaJpZM4Jnv4y .
-- Sent from Gmail Mobile
Ok no rush! I did look at the code, but it was a long time ago so I will definitely need to refresh my memory.
For me, the absurd memory usage required by multiple threads prevented me from using more than 1 core. My laptop has about 12 GiB RAM available, but that is easily exhausted by loo and causes out-of-memory exceptions.
Whoever is working on optimization, ensure that the memory is not copied when threads are enabled. There is no need to copy the memory because most of it is not modified. On Linux, the pages are shared across threads using copy-on-write. Therefore, memory usage with and without threads should be about the same.
I suspect that Rcpp will copy data unnecessarily. Therefore, you may not be able to some of the Rcpp helper functions.
How is it being multithreaded? Is OpenMP being used?