Ultra-Light-Fast-Generic-Face-Detector-1MB icon indicating copy to clipboard operation
Ultra-Light-Fast-Generic-Face-Detector-1MB copied to clipboard

Problem about running onnx model on TensorRT lib

Open pango99 opened this issue 3 years ago • 2 comments

Hi: I try to running the onnx model on NVIDIA TensorRT lib, firstly I load the version-RFB-320.onnx model, trt lib report below warnings,and detect result is wrong

onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped

I also test the version-RFB-640.onnx,it has the same problem, so whether the INT64->INT32 conversion is the cause of the mistake? For running on TensorRT, should I use the "simplified" or "without_postprocessing" version model?

pango99 avatar Apr 01 '21 10:04 pango99

hi, this conversion is not the source of the buggy detection. its rather that you need to preprocess inputs in specific way suppose input_data is your image read by cv2.imread. then you need to

input_prep = np.expand_dims(np.transpose(input_data, (2, 0, 1)), axis=0).astype(np.float32) / 255.
input_prep = np.array(input_prep, dtype=input_prep.dtype, order='C')

and feed this to your engine

k-sokolov avatar Apr 08 '21 12:04 k-sokolov

hi, this conversion is not the source of the buggy detection. its rather that you need to preprocess inputs in specific way suppose input_data is your image read by cv2.imread. then you need to

input_prep = np.expand_dims(np.transpose(input_data, (2, 0, 1)), axis=0).astype(np.float32) / 255.
input_prep = np.array(input_prep, dtype=input_prep.dtype, order='C')

and feed this to your engine

hi, k-sokolov: thanks your reply,my program is written by C,not python,and I am not skilled in python,so I rewrite my C preprocess code like below:

`cv::Mat inputDetImage = cv::dnn::blobFromImage(*detImage, 1.0 / 255.0, gCnnInputSize, cv::Scalar(0, 0, 0), true);

cudaMemcpy( gEngInputBuff_CUDA, inputDetImage.data, gCnnInputSize.width*gCnnInputSize.height * 3 * sizeof(float), cudaMemcpyHostToDevice); ` I think cv::dnn::blobFromImage() can produce the same data like your code,but the detect result is still wrong,so where is my code wrong?

pango99 avatar Apr 09 '21 10:04 pango99