silx
silx copied to clipboard
[OpenCL] "Cannot allocate memory" issues on PPC64
Any OpenCL test run on our Power9 machine results in the following error (with my environment):
PYOPENCL_CTX="0:1" ./run_tests.py silx.opencl.test.test_addition.suite
[...]
OSError: [Errno 12] Cannot allocate memory
The reason is linked to scikit-cuda:
- On one hand,
scikit-cudacreates a CUBLAS context to get the version number when imported. - On the other hand,
silxcreates an OpenCL context on all present devices to pick the best one.
For some reason doing (1) then (2) succeeds, but doing (2) then (1) fails on Power9.
The following fails:
from silx.opencl.convolution import Convolution
from silx.math.fft.cufft import CUFFT
The following succeeds:
from silx.math.fft.cufft import CUFFT
from silx.opencl.convolution import Convolution
A workaround is to modify the order of imports.
The OpenCL contexts on all visible devices seem to be created when calling pyopencl.get_platforms().
This occurs on both our Power9 and DGX1 servers. It might be due to the nvidia-persistenced daemon.
For now I see no obvious bugfix apart from being careful in the imports order.