server
server copied to clipboard
CUDNN_STATUS_EXECUTION_FAILED when Triton server is running
Hi, I deployed a ensemble model with Triton server. After serving successfully for 5 days, the server crashed with a CUDNN error CUDNN_STATUS_EXECUTION_FAILED . The docker image we use is nvcr.io/nvidia/tritonserver:22.05-py3, and runs on an A40 GPU, the dirver verion is 470.57.02. The log printed is as blow:
2022-08-13 11:42:42.261057647 [E:onnxruntime:log, cuda_call.cc:118 CudaCall] CUDNN failure 8: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=asr5.jd.163.org ; expr=cudnnConvolutionForward(s_.handle, &alpha, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.algo, workspace.get(), s_.workspace_bytes, &beta, s_.y_tensor, s_.y_data);
2022-08-13 11:42:42.261148370 [E:onnxruntime:, sequential_executor.cc:368 Execute] Non-zero status code returned while running Conv node. Name:'Conv_35' Status Message: CUDNN error executing cudnnConvolutionForward(s_.handle, &alpha, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.algo, workspace.get(), s_.workspace_bytes, &beta, s_.y_tensor, s_.y_data)
And the stack trace printed:
0# 0x0000558E7C9FB1B9 in tritonserver
1# 0x00007F14038C00C0 in /usr/lib/x86_64-linux-gnu/libc.so.6
2# cask_cudnn::ft::naryNode<cask_cudnn::sasskr::ft_sass_level>::hasBranchOrLeaf(unsigned int) const in /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
3# cask_cudnn::ft::subTree<cask_cudnn::sasskr::ft_sass_level, cask_cudnn::Convolution, cask_cudnn::ConvShader, cask_cudnn::ShaderList<cask_cudnn::ConvShader, cask_cudnn::Convolution> >::search(cask_cudnn::kr::search_t<42ul, 32ul> const*, cask_cudnn::Convolution const&, cask_cudnn::kernel_record_t**, cask_cudnn::kernel_record_t const*) const in /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
4# cask_cudnn::ft::subTree<cask_cudnn::sasskr::ft_sass_level, cask_cudnn::Convolution, cask_cudnn::ConvShader, cask_cudnn::ShaderList<cask_cudnn::ConvShader, cask_cudnn::Convolution> >::functionalSearch(cask_cudnn::kr::search_t<42ul, 32ul> const*, cask_cudnn::Convolution const&, cask_cudnn::kernel_record_t**, cask_cudnn::kernel_record_t const*, int) const in /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
5# cask_cudnn::SafeEnum<cask_cudnn::ErrorEnum> cask_cudnn::ft::convSearch<cask_cudnn::Convolution, (cask_cudnn::record_type_t)0>(cask_cudnn::ft::convTreeHandle*, cask_cudnn::ft::convTreeHandle::thsch_t*, cask_cudnn::Convolution const&, cask_cudnn::kernel_record_t**, cask_cudnn::kernel_record_t const*, int) in /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
6# cudnn::InternalStatus_t cudnn::cnn::infer::searchTree<cask_cudnn::ShaderList<cask_cudnn::ConvShader, cask_cudnn::Convolution>, cask_cudnn::Convolution, (cudnn::cnn::infer::subtree_t)0>(int, cask_cudnn::ft::convTreeHandle&, cudnn::cnn::infer::SASSEngineHelper*, cask_cudnn::ft::convSearchOptions&, cask_cudnn::Convolution&) in /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
7# cudnn::cnn::infer::CaskMixIn<cask_cudnn::Convolution, cask_cudnn::ShaderList<cask_cudnn::ConvShader, cask_cudnn::Convolution>, cask_cudnn::ConvShader, (cudnn::cnn::infer::subtree_t)0>::searchTree(int) in /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
8# cudnn::cnn::infer::SASS4dInferSubEngine<true, (cudnnTensorFormat_t)0, (cudnnTensorFormat_t)0, (cudnnTensorFormat_t)0, (cudnnDataType_t)0, (cudnn::cnn::infer::ipg_infer_choice_t)1, false, 80, (cudnn::cnn::infer::subtree_t)0>::initSupported() in /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
9# cudnn::cnn::EngineInterface::isSupported() in /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
10# cudnn::cnn::EngineContainer<(cudnnBackendEngineName_t)34, 113664ul>::initSupported() in /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
11# cudnn::cnn::EngineInterface::isSupported() in /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
12# cudnn::cnn::GeneralizedConvolutionEngine<cudnn::cnn::EngineContainer<(cudnnBackendEngineName_t)34, 113664ul> >::initSupported() in /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
13# cudnn::cnn::EngineInterface::isSupported() in /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
14# cudnn::backend::ExecutionPlan::finalize_internal() in /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
15# 0x00007F0A8E01939F in /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
16# cudnn::backend::get_plan_for_legacy_algo(cudnnContext*, cudnn::backend::OperationSet const&, cudnn::backend::array_t<cudnnBackendEngineName_t const> const&, cudnnMathType_t, unsigned long, bool, cudnn::backend::ExecutionPlan&, unsigned long&, std::function<cudnn::backend::analyze_result_t (cudnn::backend::ExecutionPlan const&)> const&) in /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
17# cudnn::InternalStatus_t cudnn::backend::make_convolution_plan<cudnn::backend::ConvolutionForwardOperation, cudnnConvolutionFwdAlgo_t, cudnn::backend::EnginesAlgoMap<cudnnConvolutionFwdAlgo_t, 8> >(cudnnContext*, cudnnTensorStruct const*, cudnnFilterStruct const*, cudnnConvolutionStruct const*, cudnnTensorStruct const*, cudnnConvolutionFwdAlgo_t, void const*, void const*, cudnn::backend::OperationSet&, cudnn::backend::ExecutionPlan&, unsigned long&, bool) in /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
18# cudnn::backend::convolutionForward(cudnnContext*, void const*, cudnnTensorStruct const*, void const*, cudnnFilterStruct const*, void const*, cudnnConvolutionStruct const*, cudnnConvolutionFwdAlgo_t, void*, unsigned long, bool, void const*, void const*, void const*, cudnnActivationStruct const*, cudnnTensorStruct const*, void*) in /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
19# cudnn::cnn::convolutionForward(cudnnContext*, void const*, cudnnTensorStruct const*, void const*, cudnnFilterStruct const*, void const*, cudnnConvolutionStruct const*, cudnnConvolutionFwdAlgo_t, void*, unsigned long, void const*, cudnnTensorStruct const*, void*) in /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
20# cudnnConvolutionForward in /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
21# 0x00007F0E10999E1D in /opt/tritonserver/backends/onnxruntime/libonnxruntime_providers_cuda.so
22# 0x00007F0E108B5B06 in /opt/tritonserver/backends/onnxruntime/libonnxruntime_providers_cuda.so
23# 0x00007F13F14F857F in /opt/tritonserver/backends/onnxruntime/libonnxruntime.so
24# 0x00007F13F14E1199 in /opt/tritonserver/backends/onnxruntime/libonnxruntime.so
25# 0x00007F13F14E326C in /opt/tritonserver/backends/onnxruntime/libonnxruntime.so
26# 0x00007F13F0EBF06D in /opt/tritonserver/backends/onnxruntime/libonnxruntime.so
27# 0x00007F13F0EBF2E8 in /opt/tritonserver/backends/onnxruntime/libonnxruntime.so
28# 0x00007F13F0E649CD in /opt/tritonserver/backends/onnxruntime/libonnxruntime.so
29# 0x00007F13F1AE2BBD in /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so
30# 0x00007F13F1AF85F3 in /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so
31# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so
32# 0x00007F140417173A in /opt/tritonserver/bin/../lib/libtritonserver.so
33# 0x00007F14041720F7 in /opt/tritonserver/bin/../lib/libtritonserver.so
34# 0x00007F140422F411 in /opt/tritonserver/bin/../lib/libtritonserver.so
35# 0x00007F140416B5C7 in /opt/tritonserver/bin/../lib/libtritonserver.so
36# 0x00007F1403CB1DE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
37# 0x00007F1404EBF609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0
38# clone in /usr/lib/x86_64-linux-gnu/libc.so.6
Any idea?
Thanks for providing detailed logs! It looks to be occurring within ONNX Runtime. Assuming the process is the same within runs, one cause of failure I can think of is if there's unexpected data being received, but I don't imagine it'd crash like this.
If we were to try to reproduce it, we'd need a model, config, and reproduction instructions, but it might be outside the scope of Triton if this is error is occurring inside of ONNX Runtime. @GuanLuo, do you know whether this might be Triton-related?
This may also be a silent OOM error: https://github.com/microsoft/onnxruntime/issues/10894#issuecomment-1110915026. Worth checking if reproducible.
Can you reproduce this in the latest 22.08 container @heibaidaolx123? It's possible the issue has already been fixed in a later release.
If it took 5 days to reproduce and is memory related, you may be able to speed up the process by sending requests at a significantly higher rate in a loop through a custom client script to reproduce. You could also do something simple like watch -n 0.1 nvidia-smi or nvidia-smi dmon in a separate shell to see if the GPU memory is constantly increasing while sending requests.
Closing due to inactivity. Let us know if you need follow-up and we can reopen the issue.