SIRIUS icon indicating copy to clipboard operation
SIRIUS copied to clipboard

GPU calls in CPU mode

Open haampie opened this issue 4 years ago • 4 comments

At various places we have if (acc::num_devices() > 0) { ... } which still gets executed in CPU mode when you have the hardware. I just noticed this because I didn't have the fix for the excessive amounts of streams yet, and acc::create_streams made tests fail on Daint even though --control.processing_unit=cpu

haampie avatar Aug 20 '20 08:08 haampie

A valid point. But the case GPU is here, but run on CPU is mostly for debug purpose. It should not be used in production. The more likely case code compiled with GPU support, but no GPU device found should be handled properly.

toxa81 avatar Aug 20 '20 08:08 toxa81

Yeah, I see, my real issue in the end appears to be not having set CRAY_CUDA_MPS=1. Running multi process MPI tests in CPU mode on a single node with a GPU doesn't work otherwise

haampie avatar Aug 20 '20 08:08 haampie

the if(acc::num_devices) is used to guard calls to GPU functions if there is no device. A system without a device, but with GPU enabled code can be simulated using export CUDA_VISIBLE_DEVICES (should work). But it would make sense if we disable at least the creation of streams if the processing unit is CPU.

simonpintarelli avatar Aug 20 '20 08:08 simonpintarelli

Agree. But this happens very early in the sirius::initialize(). This function should get information about CPU device as soon as possible. We can pass the information found in the command line or use a hacky" solution with environment variables. Say, `export SIRIUS_PU_DEVICE=CPU' will be the only way to control a device to use.

toxa81 avatar Aug 20 '20 09:08 toxa81