[Performance] CUDA kernel not found in registries for Op type: ScatterND
Describe the issue
Hello, I want to know why i download the onnxruntime v1.17.3 library, still occurs "CUDA kernel not found in registries for Op type: ScatterND", and inference performance is too slow, is there any steps i should do?
I think there are some issues and PR have fixed this issue ,but I still get this kind of problem.
To reproduce
1: I export GPT-SoVITS model, with opset version 17(is torch.onnx.export highest version) 2: use C++ onnxruntime 1.17.3 to inference on CUDAExecutionProvider 3: in VITS module, the inference speed is CUDA faster than CPU 4: but in GPT Module, the CUDA speed is slower than CPU, I get some logs find "CUDA kernel not found in registries for Op type: ScatterND" and some MemcpyToHost and MemcpyFromHost. 5: torch version 2.2, onnx version 1.16
Urgency
Download from release
Platform
Linux
OS Version
ubuntu18.04
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
onnxruntime-gpu 1.17.3
ONNX Runtime API
C++
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
No
Could you also share the code you used in the To reproduce section? CUDA kernel for ScatterND shouldn't be missing.
Could you also share the code you used in the
To reproducesection? CUDA kernel forScatterNDshouldn't be missing.
I just use this code (https://github.com/RVC-Boss/GPT-SoVITS/blob/main/GPT_SoVITS/onnx_export.py) convert torch model into onnx, and then use
import onnxruntime
sessopt = onnxruntime.SessionOptions()
sessopt.log_severity_level = 1
sess = onnxruntime.InferenceSession("onnx/onnx_cc/onnx_t2s_cc_fsdec.onnx", sess_options=sessopt, providers=["CUDAExecutionProvider"])
In python InferenceSession, I print the sess.get_providers() result, and show ["CUDAExecutionProvider", "CPUExecutionProvider"], it mean some ops work on CPU. and i see my log there my many CUDA kernel not found in register and MemcpyFrom(To)Host, as belows
I use these onnx models in C++ and then still occurs that problems and the inference speed is slow.
There is one thing that i want to mention, in the onnx_export.py export vits.onnx, there isn't such problems. Waiting for reply and advice for solutions. Thanks So much.
@11721206, Could you try 1.18.0 or 1.18.1?
1.17.3 supports ScatterND up to opset 13 but your model is opset 17: https://github.com/microsoft/onnxruntime/blob/0453cd761860e68d3852e7f81a5092c98369bb75/onnxruntime/core/providers/cuda/tensor/scatter_nd.cc#L23
1.18.1 supports ScatterND up to opset 18: https://github.com/microsoft/onnxruntime/blob/387127404e6c1d84b3468c387d864877ed1c67fe/onnxruntime/core/providers/cuda/tensor/scatter_nd.cc#L42
@11721206, Could you try 1.18.0 or 1.18.1?
1.17.3 supports ScatterND up to opset 13 but your model is opset 17:
https://github.com/microsoft/onnxruntime/blob/0453cd761860e68d3852e7f81a5092c98369bb75/onnxruntime/core/providers/cuda/tensor/scatter_nd.cc#L23
1.18.1 supports ScatterND up to opset 18:
https://github.com/microsoft/onnxruntime/blob/387127404e6c1d84b3468c387d864877ed1c67fe/onnxruntime/core/providers/cuda/tensor/scatter_nd.cc#L42
the torch version does not support opset 18; and i use onnx convert_version to convert opset version ,but I got the same issue as opset 17,ie "CUDA kernel not found in registries for Op type: ScatterND"
https://github.com/microsoft/onnxruntime/blob/0453cd761860e68d3852e7f81a5092c98369bb75/onnxruntime/core/providers/cuda/tensor/scatter_nd.cc#L21-L23 Such lines indicate that the operator is supported since opset version 13, not up to 13.
ScatterND is updated since opset 16 (new attribute reduction). The exported graph is for opset 17, but ORT 1.17.3 still has old def for it, therefore it fails to find the compatible CUDA kernel.
@11721206 please try, as suggested above, upgrading onnxruntime-gpu to 1.18.0 or 1.18.1. It should work with opset 17 and you don't have to switch the opset version you use.
pip install onnxruntime-gpu==1.18.0 # or 1.18.1
https://github.com/microsoft/onnxruntime/blob/0453cd761860e68d3852e7f81a5092c98369bb75/onnxruntime/core/providers/cuda/tensor/scatter_nd.cc#L21-L23
Such lines indicate that the operator is supported since opset version 13, not up to 13.
ScatterNDis updated since opset 16 (new attributereduction). The exported graph is for opset 17, but ORT 1.17.3 still has old def for it, therefore it fails to find the compatible CUDA kernel.@11721206 please try, as suggested above, upgrading onnxruntime-gpu to 1.18.0 or 1.18.1. It should work with opset 17 and you don't have to switch the opset version you use.
pip install onnxruntime-gpu==1.18.0 # or 1.18.1
I have tried and tested, but the result is the same. And now I try to use list to avoid using ScatterND can solve that problem. And is there any othert suggestions that i can try? and I will share my experiment's results .
@11721206 That is weird, I can reproduce and fix the warning after upgrading on my end. Could you check if you have different but older onnxruntime packages, e.g., onnxruntime-training, in the environment (pip list | grep onnxruntime)? If any, please uninstall them and reinstall onnxruntime-gpu. I don't have another clue for now.
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
This issue has been automatically closed as 'not planned' because it has been marked as 'stale' for more than 30 days without activity. If you believe this is still an issue, please feel free to reopen it.