TensorRT DataType Failed failure of TensorRT 10.0.1.6 when running RT-DETRv2 on GPU 1650Ti

DataType Failed failure of TensorRT 10.0.1.6 when running RT-DETRv2 on GPU 1650Ti

Open 5p6 opened this issue 6 months ago • 11 comments

Description

I used C++'s Tensor and CUDA APIs to infer the RT-DETRv2 model, and during the runtime, I discovered the following issues,The output information of Tensor's API does not match. When I execute nvinfer1:: ICudaEngine:: getTensorFormatDesc and 'nvinfer1:: ICudaEngine:: getTensorDataType',the code like this

   void allocator() {
        TensorNum = engine->getNbIOTensors();
        // 无论是输入输出，都分配内存
        for (int i = 0; i < TensorNum; i++) {
            // 获取张量信息
            const char* name = engine->getIOTensorName(i);
            nvinfer1::Dims dims = engine->getTensorShape(name);
            nvinfer1::DataType type = engine->getTensorDataType(name);
            // 张量类型
            const char* mode_name = engine->getTensorIOMode(name) == nvinfer1::TensorIOMode::kINPUT ? "input" : "output";
            // 张量名称
            tensor_name.emplace_back(name);
            // 图像尺寸
            tensor_size.emplace_back(std::make_pair(dims.d[2], dims.d[3]));
            // 求输入张量的字节数
            int nbytes = perbytes[int(type)];
            for (int i = 0; i < dims.nbDims; i++)
                nbytes = nbytes * dims.d[i];
            tensor_bytes.emplace_back(nbytes);
            // cuda分配内存 ，并且将其放入到映射中， 名字 : 内存地址头
            name_ptr.insert(std::make_pair(name, safeCudaMalloc(nbytes)));
            std::cout 
                << " tensor mode : "<< mode_name
                << " , tensor name : " << name
                << " , tensor dim : " << dims.d[0] << " X " << dims.d[1] << " X " << dims.d[2] << " X " << dims.d[3]
                << " , tensor btypes :  " << nbytes
                << " , getTensorDataType output type : "<< type_name[int(type)]
                << " , getTensorFormatDesc output description : " << engine->getTensorFormatDesc(name)
                << std::endl;
        }
    }

I find that the information reported by the two APIs is different, as follows

 tensor mode : input , tensor name : images , tensor dim : 1 X 3 X 640 X 640 , tensor btypes :  4915200 , getTensorDataType output type : kFLOAT , getTensorFormatDesc output description : Row major linear FP32 format (kLINEAR)
 tensor mode : input , tensor name : orig_target_sizes , tensor dim : 1 X 2 X 0 X 0 , tensor btypes :  16 , getTensorDataType output type : kINT64 , getTensorFormatDesc output description : Row major linear INT8 format (kLINEAR)
 tensor mode : output , tensor name : labels , tensor dim : 1 X 300 X 0 X 0 , tensor btypes :  2400 , getTensorDataType output type : kINT64 , getTensorFormatDesc output description : Row major linear INT8 format (kLINEAR)
 tensor mode : output , tensor name : boxes , tensor dim : 1 X 300 X 4 X 0 , tensor btypes :  4800 , getTensorDataType output type : kFLOAT , getTensorFormatDesc output description : Row major linear FP32 format (kLINEAR)
 tensor mode : output , tensor name : scores , tensor dim : 1 X 300 X 0 X 0 , tensor btypes :  1200 , getTensorDataType output type : kFLOAT , getTensorFormatDesc output description : Row major linear FP32 format (kLINEAR)

look at the tensor orig_target_sizes ,the getTensorDataType report that data type is kINT64，but the getTensorFormatDesc report that data type is int8，the getTensorFormatDesc is correct, whenI allocate CUDA memory as the getTensorFormatDesc report, the model will run directly without any problem,which troubles me a lot because when I use CUDA to allocate memory, I use getTensorDataType to automatically allocate memory, but now I need to manually allocate myself for different models. Could you please fix it?

Environment

TensorRT Version: 10.0.1.6

NVIDIA GPU: 1650Ti 4G

NVIDIA Driver Version:12.2

CUDA Version:12.2

CUDNN Version:8.8

Operating System: Windows 11

Python Version (if applicable):None

Tensorflow Version (if applicable): None

PyTorch Version (if applicable):None

Baremetal or Container (if so, version):None

Relevant Files

Model link:None

Steps To Reproduce

Commands or scripts: No Have you tried the latest release?: yet not Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): yes

Aug 04 '24 04:08 5p6

TensorRT TensorRT copied to clipboard

DataType Failed failure of TensorRT 10.0.1.6 when running RT-DETRv2 on GPU 1650Ti

Description

Environment

Relevant Files

Steps To Reproduce

TensorRT
TensorRT copied to clipboard