NudeNet
NudeNet copied to clipboard
CUDA error when inference with onnxruntime-gpu
When I tried to inference the model with the onnxruntime-gpu, a CUDA error occured.
def __init__(self, model_name="default"):
checkpoint_path = '/root/tensor/nudenet/checkpoint/detector_v2_default_checkpoint.onnx'
classes_path = '/root/tensor/nudenet/checkpoint/detector_v2_default_classes'
# CPUExecutionProvider CUDAExecutionProvider
self.detection_model = onnxruntime.InferenceSession(checkpoint_path, providers=["CUDAExecutionProvider"])
self.classes = [c.strip() for c in open(classes_path).readlines() if c.strip()]
The error is
2021-03-07 06:53:12.871020963 [E:onnxruntime:, sequential_executor.cc:339 Execute] Non-zero status code returned while running GatherND node. Name:'filtered_detections/map/while/GatherNd_28' Status Message: CUDA error cudaErrorInvalidConfiguration:invalid configuration argument
2021-03-07 06:53:12.871079083 [E:onnxruntime:, sequential_executor.cc:339 Execute] Non-zero status code returned while running Loop node. Name:'generic_loop_Loop__492' Status Message: Non-zero status code returned while running GatherND node. Name:'filtered_detections/map/while/GatherNd_28' Status Message: CUDA error cudaErrorInvalidConfiguration:invalid configuration argument
Traceback (most recent call last):
File "detector.py", line 115, in <module>
print(m.detect("/root/tensor/image-quality-assessment/t1.jpg"))
File "detector.py", line 90, in detect
outputs = self.detection_model.run(
File "/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 188, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Loop node. Name:'generic_loop_Loop__492' Status Message: Non-zero status code returned while running GatherND node. Name:'filtered_detections/map/while/GatherNd_28' Status Message: CUDA error cudaErrorInvalidConfiguration:invalid configuration argument
Specify versions of the following libraries
- nudenet
- onnxruntime-gpu: 1.7
- CUDA 11.0.3 and cuDNN 8.0.2.4
- RTX 3090
When I run the model with onnxruntime CPU, everything is fine. I also convert the onnx model to pb, and it can run on the tfserving_gpu docker.
Is the node config need be change or is the onnxruntime-gpu's problem?
@yanyabo111 I am able to reproduce the issue. I will try to figure it out when I get some free time. Meanwhile, you can fallback to previous versions of nudenet and use the tensorflow versions (that works with gpu).
@bedapudi6788 Really appreciate your hard work, is there anything I can help?
Having the same issue here. Also tried other versions of onnxruntime-gpu (1.4.0 to 1.7.0)
FYI - I hit the same bug with the ONNX model in releases, and was able to resolve it by converting the TensorFlow model (detector_v2_default_checkpoint_tf) to opset 11. I pulled down the TensorFlow model (detector_v2_default_checkpoint_tf), converted it to ONNX using tf2onnx, and no more exception.
The tf2onnx command I used after I downloaded the TF model was:
python -m tf2onnx.convert --saved-model c:\saved_model_dir --opset 11 --output saved_model.onnx
Hope that helps!
FYI - I hit the same bug with the ONNX model in releases, and was able to resolve it by converting the TensorFlow model (detector_v2_default_checkpoint_tf) to opset 11. I pulled down the TensorFlow model (detector_v2_default_checkpoint_tf), converted it to ONNX using tf2onnx, and no more exception.
The tf2onnx command I used after I downloaded the TF model was:
python -m tf2onnx.convert --saved-model c:\saved_model_dir --opset 11 --output saved_model.onnx
Hope that helps!
Worked :) thanks a lot.
i met similar problem when a inference on cuda: FAIL : Non-zero status code returned while running TopK node. Name:'/model/TopK' Status Message: CUDA error cudaErrorInvalidConfiguration:invalid configuration argument can you help me with my problem? thank you! @mrjarhead