DALI
DALI copied to clipboard
dali cuda error when running NVDeepLearningExamples with MXNET_ENABLE_CUDA_GRAPHS=1
code from https://github.com/NVIDIA/DeepLearningExamples/tree/master/MxNet/Classification/RN50v1.5 running with env var MXNET_ENABLE_CUDA_GRAPHS=1
[1,5]<stderr>: _DaliBaseIterator.__init__(self,
[1,5]<stderr>:2022-02-24 04:23:12,251:WARNING: DALI iterator does not support resetting while epoch is not finished. Ignoring...
[1,5]<stderr>:2022-02-24 04:23:12,251:INFO: Starting epoch 0
[1,3]<stderr>:terminate called after throwing an instance of 'dali::CUDAError'
[1,3]<stderr>: what(): CUDA runtime API error cudaErrorStreamCaptureUnsupported (900):
[1,3]<stderr>:operation not permitted when stream is capturing
@LSC527 In general DALI is not capturable. We can investigate, but it's unlikely that a fix is possible on DALI's side if MXNet runs on stream 0.
Hi @LSC527,
I think it would be best to ask to raise the issue in the DeepLearningExamples project and ask if the given model supports MXNET_ENABLE_CUDA_GRAPHS=1
. It is possible to use CUDA graphs for the model training and DALI together (as NVIDIA does in MLPerf) but you need to check with the model maintainers for more details.