TensorRT Conversion to TRT failure of TensorRT 8.6.1.6 when converting CO-DETR model on GPU RTX 4090

Description

I tried to convert model CO-DETR to TRT, but it fails with error below

[12/12/2024-02:17:38] [E] Error[10]: Could not find any implementation for node {ForeignNode[/0/Cast_3.../0/backbone/Reshape_3 + /0/backbone/Transpose_3]}.

Environment

TensorRT Version: 8.6.1.6

NVIDIA GPU: NVIDIA GeForce RTX 4090

NVIDIA Driver Version: 555.42.06

CUDA Version: 12.0

CUDNN Version:

Operating System: Ubuntu 22.04.3 LTS

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link: https://drive.google.com/file/d/1voa7liji1OJxDQ8tphnbnm6EM-MexI_v/view?usp=drive_link

Steps To Reproduce

Env preparation: Build docker image following TensorRT-Docker-Image
Docker run and go to docker container
PyTorch to ONNX: follow DeepStream-Yolo
ONNX to TRT: trtexec --onnx=co_dino_5scale_swin_large_16e_o365tococo_h1280w1280.onnx --saveEngine=co_dino_5scale_swin_large_16e_o365tococo_h1280w1280.engine --explicitBatch --minShapes=input:1x3x1280x1280 --optShapes=input:2x3x1280x1280 --maxShapes=input:4x3x1280x1280 --fp16 --memPoolSize=workspace:10000 --tacticSources=-cublasLt,+cublas --sparsity=enable --verbose

Commands or scripts:

Have you tried the latest release?: Not yet.

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): If I follow steps described in DeepStream-Yolo, then the generated engine file works, but the speed is slow. Therefore, I would like to use trtexec.

Dec 12 '24 02:12 edwardnguyen1705

Can you do a test, remove --memPoolSize=workspace:10000 ? And then reduce the size of dynamic shape ?

Dec 16 '24 09:12 lix19937

@edwardnguyen1705 can you update the bug with the log from the recommendations from @lix19937 . Can you also share the ONNX model so we can test it ourselves? Thanks

Dec 18 '24 21:12 asfiyab-nvidia

@edwardnguyen1705 can you update the bug with the log from the recommendations from @lix19937 . Can you also share the ONNX model so we can test it ourselves? Thanks

Thank you @asfiyab-nvidia ,

I have not tried @lix19937 's method yet. I will soon try his method.

Here is the ONNX model: https://drive.google.com/file/d/1voa7liji1OJxDQ8tphnbnm6EM-MexI_v/view?usp=drive_link

Happy Holidays!

Dec 28 '24 10:12 edwardnguyen1705

I managed to compile the mmdetection Co-DETR model from Pytorch to TensorRT (without the ONNX intermediate representation) just using Torch-TensorRT (e.g. torch.compile). I can export the model to a serialized TensorRT engine file that can be run in in C++. Feel free to check out the project https://github.com/anenbergb/Co-DETR-TensorRT.

Apr 21 '25 20:04 anenbergb