TensorRT about GroundingDINO tensorrt acceleration questions？

We have exported onnx through the script provided by the enthusiastic experts. The onnx file takes about 5 seconds to infer an image using CPU, and 2 seconds to infer an image using GPU. The CPU is Intel i5 10th generation, and the GPU is RTX 3060 12GB; But using GPU to infer the pth time of pytorch is 300ms, and under normal circumstances, the inference time of onnx will be less than that of pytorch. May I ask if everyone has encountered this situation? Because some tensorrt operators do not correspond to onnx operators. Unable to rotate normally while trying to transfer TRT. Have anyone successfully transferred? Thinks！

Jan 03 '24 03:01 xiyangyang99

How do you measure TRT perf. could you please try trtexec? it's a binary come with TRT package for perf measurement, a typical usage would be like trtexec --onnx=model.onnx --fp16 --int8

Jan 04 '24 03:01 zerollzeng

How do you measure TRT perf. could you please try trtexec? it's a binary come with TRT package for perf measurement, a typical usage would be like trtexec --onnx=model.onnx --fp16 --int8

this is our onnx2trt output logs: trtexec --onnx=/media/bowen/6202c499-4f0a-4280-af7e-d2ab4b6c74dd/home/bowen/quchun/GroundingDINO/grounded_qunchun_sim.onnx --saveEngine=result.engine --fp16 &&&& RUNNING TensorRT.trtexec [TensorRT v8600] # trtexec --onnx=/media/bowen/6202c499-4f0a-4280-af7e-d2ab4b6c74dd/home/bowen/quchun/GroundingDINO/grounded_qunchun_sim.onnx --saveEngine=result.engine --fp16 [01/03/2024-23:26:06] [I] === Model Options === [01/03/2024-23:26:06] [I] Format: ONNX [01/03/2024-23:26:06] [I] Model: /media/bowen/6202c499-4f0a-4280-af7e-d2ab4b6c74dd/home/bowen/quchun/GroundingDINO/grounded_qunchun_sim.onnx [01/03/2024-23:26:06] [I] Output: [01/03/2024-23:26:06] [I] === Build Options === [01/03/2024-23:26:06] [I] Max batch: explicit batch [01/03/2024-23:26:06] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [01/03/2024-23:26:06] [I] minTiming: 1 [01/03/2024-23:26:06] [I] avgTiming: 8 [01/03/2024-23:26:06] [I] Precision: FP32+FP16 [01/03/2024-23:26:06] [I] LayerPrecisions: [01/03/2024-23:26:06] [I] Layer Device Types: [01/03/2024-23:26:06] [I] Calibration: [01/03/2024-23:26:06] [I] Refit: Disabled [01/03/2024-23:26:06] [I] Version Compatible: Disabled [01/03/2024-23:26:06] [I] TensorRT runtime: full [01/03/2024-23:26:06] [I] Lean DLL Path: [01/03/2024-23:26:06] [I] Tempfile Controls: { in_memory: allow, temporary: allow } [01/03/2024-23:26:06] [I] Exclude Lean Runtime: Disabled [01/03/2024-23:26:06] [I] Sparsity: Disabled [01/03/2024-23:26:06] [I] Safe mode: Disabled [01/03/2024-23:26:06] [I] DirectIO mode: Disabled [01/03/2024-23:26:06] [I] Restricted mode: Disabled [01/03/2024-23:26:06] [I] Skip inference: Disabled [01/03/2024-23:26:06] [I] Save engine: result.engine [01/03/2024-23:26:06] [I] Load engine: [01/03/2024-23:26:06] [I] Profiling verbosity: 0 [01/03/2024-23:26:06] [I] Tactic sources: Using default tactic sources [01/03/2024-23:26:06] [I] timingCacheMode: local [01/03/2024-23:26:06] [I] timingCacheFile: [01/03/2024-23:26:06] [I] Heuristic: Disabled [01/03/2024-23:26:06] [I] Preview Features: Use default preview flags. [01/03/2024-23:26:06] [I] MaxAuxStreams: -1 [01/03/2024-23:26:06] [I] BuilderOptimizationLevel: 3 [01/03/2024-23:26:06] [I] Input(s)s format: fp32:CHW [01/03/2024-23:26:06] [I] Output(s)s format: fp32:CHW [01/03/2024-23:26:06] [I] Input build shapes: model [01/03/2024-23:26:06] [I] Input calibration shapes: model [01/03/2024-23:26:06] [I] === System Options === [01/03/2024-23:26:06] [I] Device: 0 [01/03/2024-23:26:06] [I] DLACore: [01/03/2024-23:26:06] [I] Plugins: [01/03/2024-23:26:06] [I] setPluginsToSerialize: [01/03/2024-23:26:06] [I] dynamicPlugins: [01/03/2024-23:26:06] [I] ignoreParsedPluginLibs: 0 [01/03/2024-23:26:06] [I] [01/03/2024-23:26:06] [I] === Inference Options === [01/03/2024-23:26:06] [I] Batch: Explicit [01/03/2024-23:26:06] [I] Input inference shapes: model [01/03/2024-23:26:06] [I] Iterations: 10 [01/03/2024-23:26:06] [I] Duration: 3s (+ 200ms warm up) [01/03/2024-23:26:06] [I] Sleep time: 0ms [01/03/2024-23:26:06] [I] Idle time: 0ms [01/03/2024-23:26:06] [I] Inference Streams: 1 [01/03/2024-23:26:06] [I] ExposeDMA: Disabled [01/03/2024-23:26:06] [I] Data transfers: Enabled [01/03/2024-23:26:06] [I] Spin-wait: Disabled [01/03/2024-23:26:06] [I] Multithreading: Disabled [01/03/2024-23:26:06] [I] CUDA Graph: Disabled [01/03/2024-23:26:06] [I] Separate profiling: Disabled [01/03/2024-23:26:06] [I] Time Deserialize: Disabled [01/03/2024-23:26:06] [I] Time Refit: Disabled [01/03/2024-23:26:06] [I] NVTX verbosity: 0 [01/03/2024-23:26:06] [I] Persistent Cache Ratio: 0 [01/03/2024-23:26:06] [I] Inputs: [01/03/2024-23:26:06] [I] === Reporting Options === [01/03/2024-23:26:06] [I] Verbose: Disabled [01/03/2024-23:26:06] [I] Averages: 10 inferences [01/03/2024-23:26:06] [I] Percentiles: 90,95,99 [01/03/2024-23:26:06] [I] Dump refittable layers:Disabled [01/03/2024-23:26:06] [I] Dump output: Disabled [01/03/2024-23:26:06] [I] Profile: Disabled [01/03/2024-23:26:06] [I] Export timing to JSON file: [01/03/2024-23:26:06] [I] Export output to JSON file: [01/03/2024-23:26:06] [I] Export profile to JSON file: [01/03/2024-23:26:06] [I] [01/03/2024-23:26:06] [I] === Device Information === [01/03/2024-23:26:06] [I] Selected Device: NVIDIA GeForce RTX 3060 [01/03/2024-23:26:06] [I] Compute Capability: 8.6 [01/03/2024-23:26:06] [I] SMs: 28 [01/03/2024-23:26:06] [I] Device Global Memory: 12036 MiB [01/03/2024-23:26:06] [I] Shared Memory per SM: 100 KiB [01/03/2024-23:26:06] [I] Memory Bus Width: 192 bits (ECC disabled) [01/03/2024-23:26:06] [I] Application Compute Clock Rate: 1.777 GHz [01/03/2024-23:26:06] [I] Application Memory Clock Rate: 7.501 GHz [01/03/2024-23:26:06] [I] [01/03/2024-23:26:06] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at. [01/03/2024-23:26:06] [I] [01/03/2024-23:26:06] [I] TensorRT version: 8.6.0 [01/03/2024-23:26:06] [I] Loading standard plugins [01/03/2024-23:26:07] [I] [TRT] [MemUsageChange] Init CUDA: CPU +215, GPU +0, now: CPU 221, GPU 799 (MiB) [01/03/2024-23:26:11] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1214, GPU +264, now: CPU 1509, GPU 1063 (MiB) [01/03/2024-23:26:11] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading [01/03/2024-23:26:11] [I] Start parsing network model. [libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. [libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 703540943 [01/03/2024-23:26:12] [I] [TRT] ---------------------------------------------------------------- [01/03/2024-23:26:12] [I] [TRT] Input filename: /media/bowen/6202c499-4f0a-4280-af7e-d2ab4b6c74dd/home/bowen/quchun/GroundingDINO/grounded_qunchun_sim.onnx [01/03/2024-23:26:12] [I] [TRT] ONNX IR version: 0.0.8 [01/03/2024-23:26:12] [I] [TRT] Opset version: 16 [01/03/2024-23:26:12] [I] [TRT] Producer name: pytorch [01/03/2024-23:26:12] [I] [TRT] Producer version: 2.0.1 [01/03/2024-23:26:12] [I] [TRT] Domain:
[01/03/2024-23:26:12] [I] [TRT] Model version: 0 [01/03/2024-23:26:12] [I] [TRT] Doc string:
[01/03/2024-23:26:12] [I] [TRT] ---------------------------------------------------------------- [libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. [libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 703540943 [01/03/2024-23:26:12] [W] [TRT] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [01/03/2024-23:26:12] [W] [TRT] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped [01/03/2024-23:26:12] [I] Finished parsing network model. Parse time: 0.879312 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: img, but no shapes were provided. Automatically overriding shape to: 1x3x1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: input_ids, but no shapes were provided. Automatically overriding shape to: 1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: attention_mask, but no shapes were provided. Automatically overriding shape to: 1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: position_ids, but no shapes were provided. Automatically overriding shape to: 1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: token_type_ids, but no shapes were provided. Automatically overriding shape to: 1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: text_token_mask, but no shapes were provided. Automatically overriding shape to: 1x1x1 [01/03/2024-23:26:12] [E] Error[4]: [graphShapeAnalyzer.cpp::analyzeShapes::2013] Error Code 4: Miscellaneous (IShuffleLayer /backbone/backbone.0/patch_embed/Reshape_1: reshape wildcard -1 has infinite number of solutions or no solution. Reshaping [1,96,0] to [-1,96,0,0].) [01/03/2024-23:26:12] [E] Engine could not be created from network [01/03/2024-23:26:12] [E] Building engine failed [01/03/2024-23:26:12] [E] Failed to create engine from model or file. [01/03/2024-23:26:12] [E] Engine set up failed &&&& FAILED TensorRT.trtexec [TensorRT v8600] # trtexec --onnx=/media/bowen/6202c499-4f0a-4280-af7e-d2ab4b6c74dd/home/bowen/quchun/GroundingDINO/grounded_qunchun_sim.onnx --saveEngine=result.engine --fp16

Jan 04 '24 03:01 xiyangyang99

How do you measure TRT perf. could you please try trtexec? it's a binary come with TRT package for perf measurement, a typical usage would be like trtexec --onnx=model.onnx --fp16 --int8

Unable to convert to trt using the trtexec tool.

Jan 04 '24 05:01 xiyangyang99

[01/03/2024-23:26:12] [W] Dynamic dimensions required for input: img, but no shapes were provided. Automatically overriding shape to: 1x3x1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: input_ids, but no shapes were provided. Automatically overriding shape to: 1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: attention_mask, but no shapes were provided. Automatically overriding shape to: 1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: position_ids, but no shapes were provided. Automatically overriding shape to: 1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: token_type_ids, but no shapes were provided. Automatically overriding shape to: 1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: text_token_mask, but no shapes were provided. Automatically overriding shape to: 1x1x1 [01/03/2024-23:26:12] [E] Error[4]: [graphShapeAnalyzer.cpp::analyzeShapes::2013] Error Code 4: Miscellaneous (IShuffleLayer /backbone/backbone.0/patch_embed/Reshape_1: reshape wildcard -1 has infinite number of solutions or no solution. Reshaping [1,96,0] to [-1,96,0,0].)

You need to set a compatible input shapes, check trtexec -h and check --optShapes

Jan 08 '24 01:01 zerollzeng

The engine file has been obtained, but during inference, the engine output cannot be aligned with the output of the pytorch model。I can provide engine files and inference scripts.

---Original--- From: "Zero @.> Date: Mon, Jan 8, 2024 09:40 AM To: @.>; Cc: @.@.>; Subject: Re: [NVIDIA/TensorRT] about GroundingDINO tensorrt acceleration questions？ (Issue #3580)

[01/03/2024-23:26:12] [W] Dynamic dimensions required for input: img, but no shapes were provided. Automatically overriding shape to: 1x3x1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: input_ids, but no shapes were provided. Automatically overriding shape to: 1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: attention_mask, but no shapes were provided. Automatically overriding shape to: 1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: position_ids, but no shapes were provided. Automatically overriding shape to: 1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: token_type_ids, but no shapes were provided. Automatically overriding shape to: 1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: text_token_mask, but no shapes were provided. Automatically overriding shape to: 1x1x1 [01/03/2024-23:26:12] [E] Error[4]: [graphShapeAnalyzer.cpp::analyzeShapes::2013] Error Code 4: Miscellaneous (IShuffleLayer /backbone/backbone.0/patch_embed/Reshape_1: reshape wildcard -1 has infinite number of solutions or no solution. Reshaping [1,96,0] to [-1,96,0,0].)

You need to set a compatible input shapes, check trtexec -h and check --optShapes

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Jan 08 '24 02:01 xiyangyang99

trtexec -h

During the process of converting onnx to tensortt, I added the -- optshape parameter, which consists of a total of 6 inputs. The following are 6 parameters that are dynamically inputted during the process of converting pytorch to onnx. 9a434e2286f527cf37fce3568048b45b The command I converted using the trtexec tool is as follows: ./trtexec --onnx=/home/liufurui/TensorRT-8.6.1.6/bin/grounded_v3_sim.onnx --saveEngine=resultv2_fp16.engine --optShapes=img:1x3x800x1200,text_token_mask:1x6x6,token_type_ids:1x6,position_ids:1x6,input_ids:1x6,attention_mask:1x6 --workspace=10000 --fp16 Finally, it was successfully converted into an engine file, but when inferring the engine, the accuracy still cannot be aligned. 186507840207f00f5b57f50b1be6d24e

Do you have any other suggestions? Thank you, author.

Jan 11 '24 02:01 xiyangyang99

Finally, it was successfully converted into an engine file, but when inferring the engine, the accuracy still cannot be aligned.

It kind of like caused by pre-processing or post-processing.

Jan 11 '24 13:01 zerollzeng

Finally, it was successfully converted into an engine file, but when inferring the engine, the accuracy still cannot be aligned.

It kind of like caused by pre-processing or post-processing.

It is possible that during the onnx2tensorrt process, tensorrt reduces the accuracy of some operators to improve speed.

Jan 15 '24 01:01 xiyangyang99

not likely, you can comfirm this with polygraphy, usage like polygraphy run model.onnx --trt --onnxrt

Jan 19 '24 13:01 zerollzeng

closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks all!

Mar 05 '24 17:03 ttyio

TensorRT TensorRT copied to clipboard

about GroundingDINO tensorrt acceleration questions？

TensorRT
TensorRT copied to clipboard