pycuda issue with pycuda and multiprocessing

I created a simple gist that exposes an error for me on some machines and not others. It init's the driver, then spawns a child process and init's the driver there. https://gist.github.com/wconstab/f06362277a6235aa87bdc4235bfde731

Running py.test test_pycuda_multi.py on some machines causes this error in the child process:

LogicError: cuInit failed: initialization error

There are differences in [OS, docker being used or not, py2/3, nvidia driver] between the machines where this gist works or crashes, so I won't attempt to label it as specific to one configuration.

It would be helpful to know whether this should work or is expected to be undefined.

Feb 28 '17 21:02 wconstab

It init's the driver, then spawns a child process

According to Nvidia, you may not fork() after intializing CUDA.

Mar 01 '17 00:03 inducer

Can you give a pointer to their mention of this?

I'm not too surprised about initialize CUDA, fork, initialize CUDA being unsupported.

Hoping that it's at least OK to initialize CUDA, fork+exec a clean process, then initialize CUDA again.

Mar 01 '17 00:03 wconstab

as another data point, we have observed that driver versions up to 361.42 seem to allow this, but fail for newer versions

Mar 01 '17 05:03 apark263

this thread seems to document the behavior

https://devtalk.nvidia.com/default/topic/973477/cuda-programming-and-performance/-cuda8-0-bug-child-process-forked-after-cuinit-get-cuda_error_not_initialized-on-cuinit-/

Mar 01 '17 05:03 apark263

It init's the driver, then spawns a child process

According to Nvidia, you may not fork() after intializing CUDA.

you can look for this issure. https://devtalk.nvidia.com/default/topic/1052699/tensorrt/how-to-implement-tensorrt-as-an-inference-server-/post/5343685/#reply pycuda doesn't work in childprocess. it can't autoinit cuda device. so strange

Oct 24 '19 14:10 JsBlueCat

This error is due to the multiprocessing functionality in DataLoader class of Pytorch. If you are able to find some other solution then I would urge you to use that one. But a shorcut to get rid of that error is as follows: Just change the "workers" argument in DataLoader class and make it equal to zero. This will resolve the error temporarily at the cost of increased computational time per epoch.

Oct 23 '20 17:10 vinits5