CUDA reports errors during multithreaded inference (多线程推理时CUDA报错)
Currently we will use scenarios where multiple threads are inferencing at the same time, is there any solution for this, CUDA inference logic is not thread safe? 请问有什么解决办法吗?CUDA推理逻辑不是线程安全的?
Error is below: 报错信息如下
File "/tt/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tt/lib/python3.11/site-packages/torch/nn/modules/activation.py", line 396, in forward
return F.silu(input, inplace=self.inplace)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tt/lib/python3.11/site-packages/torch/nn/functional.py", line 2058, in silu
return torch._C.nn.silu(input)
^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: misaligned address
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
之前还出现过如下错误:
While testing several times, another problem was discovered at the same time
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING = 1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions