torchquad icon indicating copy to clipboard operation
torchquad copied to clipboard

Let user choose which GPU to use

Open AlexMeinke opened this issue 1 year ago • 4 comments

Feature

Desired Behavior / Functionality

Currently, one can only enable or disable cuda and also only do so globally using the torchquad.set_up_backend function. First of all, this means that even on multi-GPU machines one can only ever use the first device "cuda:0". Secondly, it means that using torchquad can break existing code that one tries to integrate it into, because the set_up_backend function globally changes how torch Tensors are initialized. Instead I propose to include device as an optional argument in the integrate function.

What Needs to Be Done

Unfortunately, I am not familiar enough with the library's code to make informed comments on how this can be implemented. I suspect that it's actually a fairly difficult request.

AlexMeinke avatar Aug 19 '22 10:08 AlexMeinke

Hi Alex!

While this may theoretically be feasible, there is an easier way.

In your shell session / env just set the CUDA_VISIBLE_DEVICES environment variable.

E.g. to use device 2 on

(On Linux, but similar on Windows I think?): export CUDA_VISIBLE_DEVICES=2

For more info see (here)[https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/]

Ofc this only works for single GPU applications still. Multi GPU would be more complicated indeed.

Hope this helps? :)

gomezzz avatar Aug 19 '22 12:08 gomezzz

Thanks for the quick suggestion. Unfortunately, this still doesn't solve the issue because of the other point that I mentioned, i.e. if I did this, then all tensors would by default be instantiated on the chosen device, as opposed to the CPU by default. Of course, for each specific tensor instantiation this is easy to fix, but simply plugging the torchquad library into a large existing project can thus break a lot of things (which happens to be the case for me), unless one goes through the code and makes each CPU instantiation explicit.

AlexMeinke avatar Aug 19 '22 13:08 AlexMeinke

Hmmm. Yes I see what you mean. Then the problem is a bit more the torchquad behavior of setting the default device inside torch, I guess? One thing you could try, that I have not tested though, is to never call torchquad.set_up_backend? :thinking:

That should avoid setting the default behavior in torch.

gomezzz avatar Aug 19 '22 13:08 gomezzz

Yes, I had indeed tried this earlier (i.e. running your minimal example but with that line commented out), but as best as I can tell it then runs the entire computation on the CPU (judging by the fact that integral_value is stored on the CPU), at which point I might as well use some scipy integrator.

Anyway, I now hacked together an integrator that is sufficient for my purposes but I imagine the suggested feature would still be useful if it could be implemented.

AlexMeinke avatar Aug 19 '22 13:08 AlexMeinke