rstanarm icon indicating copy to clipboard operation
rstanarm copied to clipboard

[Feature Request] Use GPU for QR decomp?

Open SteveBronder opened this issue 6 years ago • 5 comments

Summary:

gpuR has a qr method that can be used to speedup the QR decomp in stan_glm.

Description:

For the QR code in stan_glm.fit we can use gpuR for the QR decomp. I recently ran into a problem where the QR decomp took about as long as the stan sampling itself. The GPU method above should speed things up a bit for large models.

I'm pretty sure I know what the code would look like, though I'm not sure where that option should go since there are already quite a few. We could add opencl and opencl_options where opencl is boolean and opencl_options is a list of OpenCL device and platform info

SteveBronder avatar Oct 31 '19 15:10 SteveBronder

Cool! any idea how much of a speed up we could get?

In terms of the opencl options, here’s how we’re letting the user set them in cmdstanr (scroll down to argument names):

https://mc-stan.org/cmdstanr/reference/model-method-compile.html

I would say we should use the same names (Boolean opencl is already the same) or change cmdstanr to using the the list of info you suggested instead of individually named arguments. Either way as long as we’re consistent across the R packages.

jgabry avatar Oct 31 '19 17:10 jgabry

Cool! any idea how much of a speed up we could get?

Not sure! The transfer cost is real so this would only be good for data with 100K cells (like a 1000x1000 matrix). Though the OpenCL stuff can also do parallel stuff on CPUs as well which should be a bit faster for smaller data. Ballpark for largish data I'd say like 10x or so speedup. I can run some benchmarking to find out though.

I would say we should use the same names (Boolean opencl is already the same) or change cmdstanr to using the the list of info you suggested instead of individually named arguments. Either way as long as we’re consistent across the R packages.

I'm not sure if we would ever want to add more flags for the opencl stuff (like if we ever move over to multiple devices) then I wouldn't mind having a list like

$compile(
  quiet = TRUE,
  threads = FALSE,
  opencl = FALSE,
  opencl_options = list(opencl_platform_id = 0, opencl_device_id = 0),
  compiler_flags = NULL
)

Though we could also have opencl_options be NULL by default and if the value is not null then we look at the list of options ala

$compile(
  quiet = TRUE,
  threads = FALSE,
  opencl_options = NULL,
  compiler_flags = NULL
)

Then users would call

mod$compile(
  opencl_options = list(opencl_platform_id = 0, opencl_device_id = 0)
)

to turn on the OpenCL stuff. Though you have more user facing experience than me so if that's confusing to users we can just do the true/false and arguments

SteveBronder avatar Oct 31 '19 18:10 SteveBronder

Thanks @SteveBronder. @bgoodri what do you think about enabling GPU for QR in rstanarm?

jgabry avatar Nov 05 '19 17:11 jgabry

It isn't a big bottleneck to do the QR decomposition once in doubles, but if we could do it faster I wouldn't object. But I had assumed GPU stuff was not feasible for packages like rstanarm on Windows / Mac because the CRAN server that makes binaries won't have the same GPU as the user. I suppose a user could recompile rstanarm from source, but there are not many people who do that.

On Tue, Nov 5, 2019 at 12:47 PM Jonah Gabry [email protected] wrote:

Thanks @SteveBronder https://github.com/SteveBronder. @bgoodri https://github.com/bgoodri what do you think about enabling GPU for QR in rstanarm?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/400?email_source=notifications&email_token=AAZ2XKS7Z4VLYINXIBG5B43QSGWSDA5CNFSM4JHMKNKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDDVMMY#issuecomment-549934643, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKVKHYMPMWA2V656YGDQSGWSDANCNFSM4JHMKNKA .

bgoodri avatar Nov 05 '19 22:11 bgoodri

Let me see how hard it is to get the gpuR stuff setup. If it's very lame then I'll close this

SteveBronder avatar Nov 05 '19 22:11 SteveBronder