loo icon indicating copy to clipboard operation
loo copied to clipboard

Speed up loo calcs

Open rtrangucci opened this issue 9 years ago • 12 comments

Write offending (slow) underlying functions in C++

rtrangucci avatar Aug 18 '16 17:08 rtrangucci

Which are the offending routines?

aadler avatar Sep 14 '16 14:09 aadler

I'm not sure which parts of the code would benefit most from a C++ implementation. I haven't had time to look into it. Probably need to do some profiling.

On Wednesday, September 14, 2016, Avraham Adler [email protected] wrote:

Which are the offending routines?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/loo/issues/33#issuecomment-247034388, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q4nrqG5n7ekRpErD0oJimXAgoKHxks5qqAbXgaJpZM4Jnv4y .

jgabry avatar Sep 14 '16 15:09 jgabry

I had a little free time today, so I forked loo, created an Rcpp branch, and tried my hand at something simple: moving gpdfit (and thus lx) to C++ using Rcpp. By my rough benchmarking, it's about 60% faster. Of course, it means shlepping the entire Rcpp interface into loo, but it may be worth it. If you're interested, it's here: https://github.com/aadler/loo/tree/Rcpp

If I have time I'll see what else I can do.

If you really want to get wild with speed, you can start drapping pragma omp parallels in various places :)

aadler avatar Dec 08 '16 23:12 aadler

@aadler Thanks a ton for doing this. 60% is a pretty big improvement! I haven't really had a chance to give it look yet but just realized I hadn't responded at all. Hopefully I'll be able to spend some more time on this soon.

jgabry avatar Dec 12 '16 17:12 jgabry

My pleasure. I didn't create a pull request since I personally do not use roxygen in my packages and was afraid to mess things up. In my experience, I've found that the [x]apply functions are ones well suited for conversion to compiled code.

Now, if you don't want to use Rcpp, we can go the C/Fortran95 route, but I'd recommend for loo to use Rcpp since the entire Stan ecosystem is in Rcpp.

aadler avatar Dec 12 '16 17:12 aadler

Also, to be honest, I spent very little time trying to optimize the C++; rather, I followed the R code pretty much up a tree and off a cliff trying to ensure compatibility. Once you have the C++ code in the loo package, I'm sure y'all can streamline it much better than I could have :)

aadler avatar Dec 12 '16 17:12 aadler

Related, but less important. I had some trouble with the WAIC calculations crashing (something in the colLogSumExps code on large datasets). I rewrote this part using C++. It is about 10% faster, and so I thought I'd share it. It is here and I added a pull requests to @aadler's version as well: https://github.com/reuning/loo

I'm relatively new to this all so caveats apply. Hope this helps in some small way.

reuning avatar Jun 14 '17 14:06 reuning

Sorry it’s taken so long to come back to this. We’ve been focused on other improvements rather than speed recently, but now that loo 2.0 is out I think we’re ready to more seriously consider using some C++ in the backend iff the improvements are substantial enough to warrant maintaining the additional code and including an Rcpp dependency. Anyone interested in resuming this work?

Sent with GitHawk

jgabry avatar Apr 22 '18 04:04 jgabry

Crazy busy at work, but I can probably try in a week or three. Have you looked into the code already posted?

Avi

On Sun, Apr 22, 2018 at 12:11 AM Jonah Gabry [email protected] wrote:

Sorry it’s taken so long to come back to this. We’ve been focused on other improvements rather than speed recently, but now that loo 2.0 is out I think we’re ready to more seriously consider using some C++ in the backend iff the improvements are substantial enough to warrant maintaining the additional code and including an Rcpp dependency. Anyone interested in resuming this work?

Sent with GitHawk http://githawk.com

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/loo/issues/33#issuecomment-383354019, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVk8VloRQNJTY10FXuWrPoriwMFsBtvks5trAMGgaJpZM4Jnv4y .

-- Sent from Gmail Mobile

aadler avatar Apr 23 '18 06:04 aadler

Ok no rush! I did look at the code, but it was a long time ago so I will definitely need to refresh my memory.

jgabry avatar Apr 23 '18 16:04 jgabry

For me, the absurd memory usage required by multiple threads prevented me from using more than 1 core. My laptop has about 12 GiB RAM available, but that is easily exhausted by loo and causes out-of-memory exceptions.

Whoever is working on optimization, ensure that the memory is not copied when threads are enabled. There is no need to copy the memory because most of it is not modified. On Linux, the pages are shared across threads using copy-on-write. Therefore, memory usage with and without threads should be about the same.

I suspect that Rcpp will copy data unnecessarily. Therefore, you may not be able to some of the Rcpp helper functions.

jpritikin avatar Jun 04 '18 04:06 jpritikin

How is it being multithreaded? Is OpenMP being used?

aadler avatar Jun 29 '18 15:06 aadler