silx icon indicating copy to clipboard operation
silx copied to clipboard

[OpenCL] "Cannot allocate memory" issues on PPC64

Open pierrepaleo opened this issue 6 years ago • 1 comments

Any OpenCL test run on our Power9 machine results in the following error (with my environment):

PYOPENCL_CTX="0:1" ./run_tests.py silx.opencl.test.test_addition.suite
[...]
OSError: [Errno 12] Cannot allocate memory

The reason is linked to scikit-cuda:

  1. On one hand, scikit-cuda creates a CUBLAS context to get the version number when imported.
  2. On the other hand, silx creates an OpenCL context on all present devices to pick the best one.

For some reason doing (1) then (2) succeeds, but doing (2) then (1) fails on Power9.

The following fails:

from silx.opencl.convolution import Convolution                                                                                                                                       
from silx.math.fft.cufft import CUFFT 

The following succeeds:

from silx.math.fft.cufft import CUFFT                                                                                                                                                  
from silx.opencl.convolution import Convolution

A workaround is to modify the order of imports.

pierrepaleo avatar Aug 01 '19 15:08 pierrepaleo

The OpenCL contexts on all visible devices seem to be created when calling pyopencl.get_platforms(). This occurs on both our Power9 and DGX1 servers. It might be due to the nvidia-persistenced daemon.

For now I see no obvious bugfix apart from being careful in the imports order.

pierrepaleo avatar Aug 02 '19 08:08 pierrepaleo