SIRIUS
SIRIUS copied to clipboard
GPU calls in CPU mode
At various places we have if (acc::num_devices() > 0) { ... }
which still gets executed in CPU mode when you have the hardware. I just noticed this because I didn't have the fix for the excessive amounts of streams yet, and acc::create_streams
made tests fail on Daint even though --control.processing_unit=cpu
A valid point. But the case GPU is here, but run on CPU
is mostly for debug purpose. It should not be used in production.
The more likely case code compiled with GPU support, but no GPU device found
should be handled properly.
Yeah, I see, my real issue in the end appears to be not having set CRAY_CUDA_MPS=1. Running multi process MPI tests in CPU mode on a single node with a GPU doesn't work otherwise
the if(acc::num_devices)
is used to guard calls to GPU functions if there is no device. A system without a device, but with GPU enabled code can be simulated using export CUDA_VISIBLE_DEVICES
(should work). But it would make sense if we disable at least the creation of streams if the processing unit is CPU.
Agree. But this happens very early in the sirius::initialize(). This function should get information about CPU device as soon as possible. We can pass the information found in the command line or use a hacky" solution with environment variables. Say, `export SIRIUS_PU_DEVICE=CPU' will be the only way to control a device to use.