onnxruntime EP_FAIL : Non-zero status code returned while running Conv node. Name:'/features/features.0/Conv' Status Message: Failed to initialize CUDNN Frontend

I have an EC2 instance of type g5g.xlarge. I have installed the following:

CUDA-Toolit: Cuda compilation tools, release 12.4, V12.4.131
CUDNN Version: 9.6.0
Python: 3.12
Pytorch: Compiled from source as for aarch64 v2.5 is not available.
Onnxruntime: Compiled from source as the distrubution package is not available for the architecture
Architecture: aarch64
OS: Amazon Linux 2023

On the following code:

def to_numpy(tensor):
    return tensor.detach().gpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

# compute ONNX Runtime output prediction
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(input_batch)}
ort_outs = ort_session.run(None, ort_inputs)

I am getting the following Error:

EP Error: [ONNXRuntimeError] : 11 : EP_FAIL : Non-zero status code returned while running Conv node. Name:'/features/features.0/Conv' Status Message: Failed to initialize CUDNN Frontend/home/ec2-user/onnxruntime/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:99 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /home/ec2-user/onnxruntime/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:91 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDNN_FE failure 11: CUDNN_BACKEND_API_FAILED ; GPU=0 ; hostname=sg-gpu-1 ; file=/home/ec2-user/onnxruntime/onnxruntime/core/providers/cuda/nn/conv.cc ; line=224 ; expr=s_.cudnn_fe_graph->build_operation_graph(handle); 


with the cudnn frontend json:
{"context":{"compute_data_type":"FLOAT","intermediate_data_type":"FLOAT","io_data_type":"FLOAT","name":"","sm_count":-1},"cudnn_backend_version":"9.6.0","cudnn_frontend_version":10700,"json_version":"1.0","nodes":[{"compute_data_type":"FLOAT","dilation":[1,1],"inputs":{"W":"w","X":"x"},"math_mode":"CROSS_CORRELATION","name":"","outputs":{"Y":"::Y"},"post_padding":[2,2],"pre_padding":[2,2],"stride":[4,4],"tag":"CONV_FPROP"}],"tensors":{"::Y":{"data_type":"FLOAT","dim":[1,64,55,55],"is_pass_by_value":false,"is_virtual":false,"name":"::Y","pass_by_value":null,"reordering_type":"NONE","stride":[193600,3025,55,1],"uid":0,"uid_assigned":false},"w":{"data_type":"FLOAT","dim":[64,3,11,11],"is_pass_by_value":false,"is_virtual":false,"name":"w","pass_by_value":null,"reordering_type":"NONE","stride":[363,121,11,1],"uid":1,"uid_assigned":true},"x":{"data_type":"FLOAT","dim":[1,3,224,224],"is_pass_by_value":false,"is_virtual":false,"name":"x","pass_by_value":null,"reordering_type":"NONE","stride":[150528,50176,224,1],"uid":0,"uid_assigned":false}}} using ['CUDAExecutionProvider', 'CPUExecutionProvider']
Falling back to ['CPUExecutionProvider'] and retrying.
2025-01-08 12:06:10.797719929 [E:onnxruntime:Default, cudnn_fe_call.cc:33 CudaErrString<cudnn_frontend::error_object>] CUDNN_BACKEND_TENSOR_DESCRIPTOR cudnnFinalize failed cudnn_status: CUDNN_STATUS_SUBLIBRARY_LOADING_FAILED
2025-01-08 12:06:10.797924540 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'/features/features.0/Conv' Status Message: Failed to initialize CUDNN Frontend/home/ec2-user/onnxruntime/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:99 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /home/ec2-user/onnxruntime/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:91 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDNN_FE failure 11: CUDNN_BACKEND_API_FAILED ; GPU=0 ; hostname=sg-gpu-1 ; file=/home/ec2-user/onnxruntime/onnxruntime/core/providers/cuda/nn/conv.cc ; line=224 ; expr=s_.cudnn_fe_graph->build_operation_graph(handle); 

with the cudnn frontend json:
{"context":{"compute_data_type":"FLOAT","intermediate_data_type":"FLOAT","io_data_type":"FLOAT","name":"","sm_count":-1},"cudnn_backend_version":"9.6.0","cudnn_frontend_version":10700,"json_version":"1.0","nodes":[{"compute_data_type":"FLOAT","dilation":[1,1],"inputs":{"W":"w","X":"x"},"math_mode":"CROSS_CORRELATION","name":"","outputs":{"Y":"::Y"},"post_padding":[2,2],"pre_padding":[2,2],"stride":[4,4],"tag":"CONV_FPROP"}],"tensors":{"::Y":{"data_type":"FLOAT","dim":[1,64,55,55],"is_pass_by_value":false,"is_virtual":false,"name":"::Y","pass_by_value":null,"reordering_type":"NONE","stride":[193600,3025,55,1],"uid":0,"uid_assigned":false},"w":{"data_type":"FLOAT","dim":[64,3,11,11],"is_pass_by_value":false,"is_virtual":false,"name":"w","pass_by_value":null,"reordering_type":"NONE","stride":[363,121,11,1],"uid":1,"uid_assigned":true},"x":{"data_type":"FLOAT","dim":[1,3,224,224],"is_pass_by_value":false,"is_virtual":false,"name":"x","pass_by_value":null,"reordering_type":"NONE","stride":[150528,50176,224,1],"uid":0,"uid_assigned":false}}}

However, prints from the below code confirms that the installation is done perfectly:

print("Pytorch CUDA:", torch.cuda.is_available())
print("Available Providers:", onnxruntime.get_available_providers())
print("Active Providers for this session:", ort_session.get_providers())

Output:

Pytorch CUDA: True
Available Providers: ['CUDAExecutionProvider', 'CPUExecutionProvider']
Active Providers for this session: ['CUDAExecutionProvider', 'CPUExecutionProvider']

In order to resolve this, I have installed the nvidia_cudnn_frontend v1.9.0 from the source. Still it is not resolved.

nvidia-smi is working. Its version is: NVIDIA-SMI 550.127.08 nvcc is also working fine.

nvidia-cudnn-frontend==1.9.0
nvtx==0.2.10
onnx==1.17.0
onnxruntime-gpu==1.20.1
optree==0.13.1
torch==2.5.0a0+gita8d6afb
torchaudio==2.5.1
torchvision==0.20.1

Versions

Collecting environment information...
PyTorch version: 2.5.0a0+gita8d6afb
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A

OS: Amazon Linux 2023.6.20241212 (aarch64)
GCC version: (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)
Clang version: Could not collect
CMake version: version 3.31.2
Libc version: glibc-2.34

Python version: 3.12.0 (main, Jan  5 2025, 18:22:01) [GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] (64-bit runtime)
Python platform: Linux-6.1.119-129.201.amzn2023.aarch64-aarch64-with-glibc2.34
Is CUDA available: True
CUDA runtime version: 12.4.131
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA T4G
Nvidia driver version: 550.127.08
cuDNN version: Probably one of the following:
/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn.so.9
/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_adv.so.9
/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_cnn.so.9
/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_engines_precompiled.so.9
/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_engines_runtime_compiled.so.9
/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_graph.so.9
/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_heuristic.so.9
/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_ops.so.9
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                         aarch64
CPU op-mode(s):                       32-bit, 64-bit
Byte Order:                           Little Endian
CPU(s):                               4
On-line CPU(s) list:                  0-3
Vendor ID:                            ARM
Model name:                           Neoverse-N1
Model:                                1
Thread(s) per core:                   1
Core(s) per socket:                   4
Socket(s):                            1
Stepping:                             r3p1
BogoMIPS:                             243.75
Flags:                                fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
L1d cache:                            256 KiB (4 instances)
L1i cache:                            256 KiB (4 instances)
L2 cache:                             4 MiB (4 instances)
L3 cache:                             32 MiB (1 instance)
NUMA node(s):                         1
NUMA node0 CPU(s):                    0-3
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; CSV2, BHB
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.3
[pip3] nvidia-cudnn-frontend==1.9.0
[pip3] nvtx==0.2.10
[pip3] onnx==1.17.0
[pip3] onnxruntime-gpu==1.20.1
[pip3] optree==0.13.1
[pip3] torch==2.5.0a0+gita8d6afb
[pip3] torchaudio==2.5.1
[pip3] torchvision==0.20.1
[conda] Could not collect

Jan 09 '25 07:01 m0hammadjaan

I also build the onnxruntime-gpu == 1.20.0 and the got the same error on the same place.

Jan 09 '25 07:01 m0hammadjaan

@tianleiwu , it is related to cudnn frontend.

Jan 09 '25 19:01 snnn

@snnn @tianleiwu any update on this issue...

Jan 14 '25 04:01 m0hammadjaan

@m0hammadjaan, please try add some environment variable to collect cudnn debug log:

export CUDNN_FRONTEND_LOG_FLIE=stdout
export CUDNN_FRONTEND_LOG_INFO=1
export CUDNN_LOGLEVEL_DBG=3
export CUDNN_LOGDEST_DBG=stdout

Then run your tests.

CUDNN_STATUS_SUBLIBRARY_LOADING_FAILED means it cannot load a sub library (.so), and I think it is likely an environment setup issue (try add /usr/local/cuda-12.4/targets/sbsa-linux/lib/ to LD_LIBRARY_PATH environment variable).

Jan 14 '25 08:01 tianleiwu

@tianleiwu I set these environmental variables and still getting the same error. Then I try to add /usr/local/cuda-12.4/targets/sbsa-linux/lib/ to LD_LIBRARY_PATH along with other variables and stilll getting the same error.

stdout
1
3
stdout
/usr/local/cuda-12.4/targets/sbsa-linux/lib/:/usr/local/cuda-12.4/lib64
EP Error: [ONNXRuntimeError] : 11 : EP_FAIL : Non-zero status code returned while running Conv node. Name:'/features/features.0/Conv' Status Message: Failed to initialize CUDNN Frontend/home/t_mjan/onnxruntime/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:99 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /home/t_mjan/onnxruntime/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:91 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDNN_FE failure 11: CUDNN_BACKEND_API_FAILED ; GPU=0 ; hostname=sg-gpu-1 ; file=/home/t_mjan/onnxruntime/onnxruntime/core/providers/cuda/nn/conv.cc ; line=224 ; expr=s_.cudnn_fe_graph->build_operation_graph(handle); 

with the cudnn frontend json:
{"context":{"compute_data_type":"FLOAT","intermediate_data_type":"FLOAT","io_data_type":"FLOAT","name":"","sm_count":-1},"cudnn_backend_version":"9.6.0","cudnn_frontend_version":10700,"json_version":"1.0","nodes":[{"compute_data_type":"FLOAT","dilation":[1,1],"inputs":{"W":"w","X":"x"},"math_mode":"CROSS_CORRELATION","name":"","outputs":{"Y":"::Y"},"post_padding":[2,2],"pre_padding":[2,2],"stride":[4,4],"tag":"CONV_FPROP"}],"tensors":{"::Y":{"data_type":"FLOAT","dim":[1,64,55,55],"is_pass_by_value":false,"is_virtual":false,"name":"::Y","pass_by_value":null,"reordering_type":"NONE","stride":[193600,3025,55,1],"uid":0,"uid_assigned":false},"w":{"data_type":"FLOAT","dim":[64,3,11,11],"is_pass_by_value":false,"is_virtual":false,"name":"w","pass_by_value":null,"reordering_type":"NONE","stride":[363,121,11,1],"uid":1,"uid_assigned":true},"x":{"data_type":"FLOAT","dim":[1,3,224,224],"is_pass_by_value":false,"is_virtual":false,"name":"x","pass_by_value":null,"reordering_type":"NONE","stride":[150528,50176,224,1],"uid":0,"uid_assigned":false}}} using ['CUDAExecutionProvider', 'CPUExecutionProvider']
Falling back to ['CPUExecutionProvider'] and retrying.
2025-01-14 14:25:19.871049140 [E:onnxruntime:Default, cudnn_fe_call.cc:33 CudaErrString<cudnn_frontend::error_object>] CUDNN_BACKEND_TENSOR_DESCRIPTOR cudnnFinalize failed cudnn_status: CUDNN_STATUS_SUBLIBRARY_LOADING_FAILED
2025-01-14 14:25:19.871387856 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'/features/features.0/Conv' Status Message: Failed to initialize CUDNN Frontend/home/t_mjan/onnxruntime/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:99 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /home/t_mjan/onnxruntime/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:91 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDNN_FE failure 11: CUDNN_BACKEND_API_FAILED ; GPU=0 ; hostname=sg-gpu-1 ; file=/home/t_mjan/onnxruntime/onnxruntime/core/providers/cuda/nn/conv.cc ; line=224 ; expr=s_.cudnn_fe_graph->build_operation_graph(handle); 

with the cudnn frontend json:
{"context":{"compute_data_type":"FLOAT","intermediate_data_type":"FLOAT","io_data_type":"FLOAT","name":"","sm_count":-1},"cudnn_backend_version":"9.6.0","cudnn_frontend_version":10700,"json_version":"1.0","nodes":[{"compute_data_type":"FLOAT","dilation":[1,1],"inputs":{"W":"w","X":"x"},"math_mode":"CROSS_CORRELATION","name":"","outputs":{"Y":"::Y"},"post_padding":[2,2],"pre_padding":[2,2],"stride":[4,4],"tag":"CONV_FPROP"}],"tensors":{"::Y":{"data_type":"FLOAT","dim":[1,64,55,55],"is_pass_by_value":false,"is_virtual":false,"name":"::Y","pass_by_value":null,"reordering_type":"NONE","stride":[193600,3025,55,1],"uid":0,"uid_assigned":false},"w":{"data_type":"FLOAT","dim":[64,3,11,11],"is_pass_by_value":false,"is_virtual":false,"name":"w","pass_by_value":null,"reordering_type":"NONE","stride":[363,121,11,1],"uid":1,"uid_assigned":true},"x":{"data_type":"FLOAT","dim":[1,3,224,224],"is_pass_by_value":false,"is_virtual":false,"name":"x","pass_by_value":null,"reordering_type":"NONE","stride":[150528,50176,224,1],"uid":0,"uid_assigned":false}}}

Jan 14 '25 14:01 m0hammadjaan

@m0hammadjaan, when you installed cudnn-front-end (although not needed by ORT) from source, did you verify that the installation is good following https://github.com/NVIDIA/cudnn-frontend?tab=readme-ov-file#checking-the-installation?

You can check DLL (*.so) loading like

export LD_DEBUG=libs
python your_script.py

OR

strace -e file python your_script.py 2> strace_output.txt

You shall be able to see which *.so file failed to load during your test.

Jan 14 '25 16:01 tianleiwu

@tianleiwu, yes I have followed the same README that you have mentioned. Furthermore the strace output looks as following:

openat(AT_FDCWD, "/usr/local/cuda-12.4/targets/sbsa-linux/lib/libonnxruntime_providers_cuda.so", O_RDONLY|O_CLOEXEC) = 44
openat(AT_FDCWD, "/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn.so.9", O_RDONLY|O_CLOEXEC) = 44
openat(AT_FDCWD, "/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_adv.so.9", O_RDONLY|O_CLOEXEC) = 44
openat(AT_FDCWD, "/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_ops.so.9", O_RDONLY|O_CLOEXEC) = 44
openat(AT_FDCWD, "/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_cnn.so.9", O_RDONLY|O_CLOEXEC) = 44
openat(AT_FDCWD, "/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_graph.so.9", O_RDONLY|O_CLOEXEC) = 44
openat(AT_FDCWD, "/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_engines_runtime_compiled.so.9", O_RDONLY|O_CLOEXEC) = 44
openat(AT_FDCWD, "/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_engines_precompiled.so.9", O_RDONLY|O_CLOEXEC) = 44
openat(AT_FDCWD, "/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_heuristic.so.9", O_RDONLY|O_CLOEXEC) = 44
openat(AT_FDCWD, "/usr/local/cuda-12.4/targets/sbsa-linux/lib/libnvrtc.so.12", O_RDONLY|O_CLOEXEC) = 44
openat(AT_FDCWD, "/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_graph.so.9.6.0", O_RDONLY|O_CLOEXEC) = 44
openat(AT_FDCWD, "/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_ops.so.9.6.0", O_RDONLY|O_CLOEXEC) = 44
openat(AT_FDCWD, "/usr/local/cuda-12.4/targets/sbsa-linux/lib/libcudnn_engines_precompiled.so.9.6.0", O_RDONLY|O_CLOEXEC) = 44
newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=114, ...}, 0) = 0
newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=114, ...}, 0) = 0
[1;31m2025-01-16 10:28:54.219682349 [E:onnxruntime:Default, cudnn_fe_call.cc:33 CudaErrString<cudnn_frontend::error_object>] CUDNN_BACKEND_TENSOR_DESCRIPTOR cudnnFinalize failed cudnn_status: CUDNN_STATUS_SUBLIBRARY_LOADING_FAILED[m
[1;31m2025-01-16 10:28:54.219935090 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'/features/features.0/Conv' Status Message: Failed to initialize CUDNN Frontend/home/t_mjan/onnxruntime/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:99 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /home/t_mjan/onnxruntime/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:91 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDNN_FE failure 11: CUDNN_BACKEND_API_FAILED ; GPU=0 ; hostname=sg-gpu-1 ; file=/home/t_mjan/onnxruntime/onnxruntime/core/providers/cuda/nn/conv.cc ; line=224 ; expr=s_.cudnn_fe_graph->build_operation_graph(handle); 

with the cudnn frontend json:
{"context":{"compute_data_type":"FLOAT","intermediate_data_type":"FLOAT","io_data_type":"FLOAT","name":"","sm_count":-1},"cudnn_backend_version":"9.6.0","cudnn_frontend_version":10700,"json_version":"1.0","nodes":[{"compute_data_type":"FLOAT","dilation":[1,1],"inputs":{"W":"w","X":"x"},"math_mode":"CROSS_CORRELATION","name":"","outputs":{"Y":"::Y"},"post_padding":[2,2],"pre_padding":[2,2],"stride":[4,4],"tag":"CONV_FPROP"}],"tensors":{"::Y":{"data_type":"FLOAT","dim":[1,64,55,55],"is_pass_by_value":false,"is_virtual":false,"name":"::Y","pass_by_value":null,"reordering_type":"NONE","stride":[193600,3025,55,1],"uid":0,"uid_assigned":false},"w":{"data_type":"FLOAT","dim":[64,3,11,11],"is_pass_by_value":false,"is_virtual":false,"name":"w","pass_by_value":null,"reordering_type":"NONE","stride":[363,121,11,1],"uid":1,"uid_assigned":true},"x":{"data_type":"FLOAT","dim":[1,3,224,224],"is_pass_by_value":false,"is_virtual":false,"name":"x","pass_by_value":null,"reordering_type":"NONE","stride":[150528,50176,224,1],"uid":0,"uid_assigned":false}}}[m
openat(AT_FDCWD, "alexnet.onnx", O_RDONLY) = 44
+++ exited with 0 +++

Jan 16 '25 10:01 m0hammadjaan

@tianleiwu any updates on it?

Jan 22 '25 06:01 m0hammadjaan

@m0hammadjaan, Could you try build a binary with tlwu/conv_cudnn_fe_fallback branch. It will try fallback Conv to not use cudnn frontend. Let me know if it could resolve the issue.

Jan 22 '25 23:01 tianleiwu

Same problem (fail with CUDA, works on CPU). The debug output doesn't help. Where to look at / how to pinpoint?

Available Providers: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
Active Providers for this session: ['CUDAExecutionProvider', 'CPUExecutionProvider']
[...]
2025-12-05 01:41:01.838744757 [E:onnxruntime:Default, cudnn_fe_call.cc:33 CudaErrString<cudnn_frontend::error_object>] execute(handle, plan->get_raw_desc(), variant_pack_descriptor.get_ptr()) failed with message: , and code: CUDNN_STATUS_EXECUTION_FAILED_CUDART
2025-12-05 01:41:01.838766991 [E:onnxruntime:Default, cudnn_fe_call.cc:93 CudaCall] CUDNN_FE failure 11: CUDNN_BACKEND_API_FAILED ; GPU=0 ; hostname=acer ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/nn/conv.cc ; line=483 ; expr=s_.cudnn_fe_graph->execute(cudnn_handle, s_.variant_pack, ws.get()); 
2025-12-05 01:41:01.838783846 [E:onnxruntime:, sequential_executor.cc:572 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'Conv_0' Status Message: CUDNN_FE failure 11: CUDNN_BACKEND_API_FAILED ; GPU=0 ; hostname=acer ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/nn/conv.cc ; line=483 ; expr=s_.cudnn_fe_graph->execute(cudnn_handle, s_.variant_pack, ws.get());

$ pip freeze|egrep 'onnxruntime|cud|torch|tensor'
compressed-tensors==0.10.2
cupy-cuda12x==13.5.1
nvidia-cuda-cupti-cu12==12.6.80
nvidia-cuda-nvrtc-cu12==12.6.77
nvidia-cuda-runtime-cu12==12.6.77
nvidia-cudnn-cu12==9.5.1.17
onnxruntime==1.23.2
onnxruntime-gpu==1.23.2
onnxruntime-openvino==1.20.0
pytorch-lightning==2.5.2
pytorch-metric-learning==2.8.1
pytorch-triton-rocm==3.1.0
safetensors==0.4.5
tensorboard==2.19.0
tensorboard-data-server==0.7.2
tensorboardX==2.6.4
tensorflow==2.19.0
tensorrt==10.12.0.36
tensorrt_cu12==10.12.0.36
tensorrt_cu12_bindings==10.12.0.36
tensorrt_cu12_libs==10.12.0.36
torch==2.7.0
torch-audiomentations==0.12.0
torchaudio==2.7.0
torchmetrics==1.6.0
torchvision==0.22.0
types-tensorflow==2.12

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.274.02             Driver Version: 535.274.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|

Dec 05 '25 04:12 drzraf

Error code related to "Failed to initialize CUDNN Frontend": (1) Graph Execution Plan Creation failed or execution plan not found It is a known issue of cudnn frontend that does not support some convolution in some older vision models (like yolo v3, yolo v4, mobilenet v1). One potential walkaround is to pip install onnxruntime-gpu==1.19.0, which does not use cudnn frontend. (2) CUDNN_STATUS_SUBLIBRARY_LOADING_FAILED means cudnn frontend failed to load cudnn backend DLLs. Need to check path setting (LD_LIBRARY_PATH), and also see whether pytorch version is compatible (since PyTorch might preload cudnn). (3) CUDNN_STATUS_EXECUTION_FAILED_CUDART means failure in execution since cuda runtime reports error. Cuda runtime failure might be caused by many reasons including previous errors. You may use nvidia compute sanitizer to find the root cause. For cudnn trouble shooting, see https://docs.nvidia.com/deeplearning/cudnn/backend/v9.5.0/reference/troubleshooting.html

Dec 05 '25 23:12 tianleiwu

Is there an existing GH issue/PR to further dig and understand it, subscribe/track any possible future progress?

Dec 06 '25 21:12 drzraf

Is there an existing GH issue/PR to further dig and understand it, subscribe/track any possible future progress?

I updated the above comments to avoid confusion (since there are multiple error codes in this thread).

Dec 08 '25 22:12 tianleiwu