TensorRT
TensorRT copied to clipboard
Conversion to TRT failure of TensorRT 8.6.1.6 when converting CO-DETR model on GPU RTX 4090
Description
I tried to convert model CO-DETR to TRT, but it fails with error below
[12/12/2024-02:17:38] [E] Error[10]: Could not find any implementation for node {ForeignNode[/0/Cast_3.../0/backbone/Reshape_3 + /0/backbone/Transpose_3]}.
Environment
TensorRT Version: 8.6.1.6
NVIDIA GPU: NVIDIA GeForce RTX 4090
NVIDIA Driver Version: 555.42.06
CUDA Version: 12.0
CUDNN Version:
Operating System: Ubuntu 22.04.3 LTS
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link: https://drive.google.com/file/d/1voa7liji1OJxDQ8tphnbnm6EM-MexI_v/view?usp=drive_link
Steps To Reproduce
- Env preparation: Build docker image following TensorRT-Docker-Image
- Docker run and go to docker container
- PyTorch to ONNX: follow DeepStream-Yolo
- ONNX to TRT:
trtexec --onnx=co_dino_5scale_swin_large_16e_o365tococo_h1280w1280.onnx --saveEngine=co_dino_5scale_swin_large_16e_o365tococo_h1280w1280.engine --explicitBatch --minShapes=input:1x3x1280x1280 --optShapes=input:2x3x1280x1280 --maxShapes=input:4x3x1280x1280 --fp16 --memPoolSize=workspace:10000 --tacticSources=-cublasLt,+cublas --sparsity=enable --verbose
Commands or scripts:
Have you tried the latest release?: Not yet.
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): If I follow steps described in DeepStream-Yolo, then the generated engine file works, but the speed is slow. Therefore, I would like to use trtexec.