apollo Running caddn_paddle model get CUDNN_STATUS_NOT

We appreciate you go through Apollo documentations and search previous issues before creating an new one. If neither of the sources helped you with your issues, please report the issue using the following form. Please note missing info can delay the response time.

System information

OS Platform and Distribution (apollo8.0 docker image):
Apollo installed from (build source in docker container):
Apollo version (8.0):
Output of apollo.sh config if on master branch:
NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2
- Tesla V100

Steps to reproduce the issue:

Please use bullet points and include as much details as possible:

install caddn_paddle by amodel install caddn_paddle.zip
change modules/perception/production/conf/perception/perception_common.flag by appending caddn_model_file and caddn_params_file to load caddn_paddle model, also change modules/perception/pipeline/config/camera_detection_pipeline.pb.txt to load caddn_paddle
running caddn_paddle model by mainboard -d modules/perception/production/dag/dag_streaming_perception_camera.dag

the detector component occur error:

terminate called after throwing an instance of 'phi::enforce::EnforceNotMet' what(): (External) CUDNN error(9), CUDNN_STATUS_NOT_SUPPORTED. [Hint: Please search for the error code(9) on website (https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnStatus_t) to get Nvidia's official solution and advice about CUDNN Error.] (at /apollo/data/Paddle/paddle/fluid/operators/grid_sampler_cudnn_op.cu.cc:81) [operator < grid_sampler > error] Aborted

Supporting materials (screenshots, command lines, code/script snippets):

Feb 26 '24 08:02 Michael-Fuu

what is your output of apollo.sh config in the docker container you use, maybe the cuda version doesn't match the driver?

Feb 26 '24 09:02 CesarLiu

what is your output of apollo.sh config in the docker container you use, maybe the cuda version doesn't match the driver?

I'm running apollo docker on the Baidu Cloud, apoll.sh config return [CGPU-CUDA:ERR]

cgpu auth check failed, proc will exit soon, please check your running environment, we only support baidu cloud.

But I don't think this is a problem, it's related to GPU memory sharing.I've tried CUDA Version: 11.2 and 12.2, with the same error.

Feb 26 '24 09:02 Michael-Fuu

I think this may be a problem with the paddle caddn operator, I will check and feedback then

Feb 27 '24 06:02 daohu527

Running caddn_paddle model get CUDNN_STATUS_NOT_SUPPORTED error

System information

Steps to reproduce the issue:

Supporting materials (screenshots, command lines, code/script snippets):