TensorRT
TensorRT copied to clipboard
about GroundingDINO tensorrt acceleration questions?
We have exported onnx through the script provided by the enthusiastic experts. The onnx file takes about 5 seconds to infer an image using CPU, and 2 seconds to infer an image using GPU. The CPU is Intel i5 10th generation, and the GPU is RTX 3060 12GB; But using GPU to infer the pth time of pytorch is 300ms, and under normal circumstances, the inference time of onnx will be less than that of pytorch. May I ask if everyone has encountered this situation? Because some tensorrt operators do not correspond to onnx operators. Unable to rotate normally while trying to transfer TRT. Have anyone successfully transferred? Thinks!
How do you measure TRT perf. could you please try trtexec? it's a binary come with TRT package for perf measurement, a typical usage would be like trtexec --onnx=model.onnx --fp16 --int8
How do you measure TRT perf. could you please try trtexec? it's a binary come with TRT package for perf measurement, a typical usage would be like
trtexec --onnx=model.onnx --fp16 --int8
this is our onnx2trt output logs:
trtexec --onnx=/media/bowen/6202c499-4f0a-4280-af7e-d2ab4b6c74dd/home/bowen/quchun/GroundingDINO/grounded_qunchun_sim.onnx --saveEngine=result.engine --fp16
&&&& RUNNING TensorRT.trtexec [TensorRT v8600] # trtexec --onnx=/media/bowen/6202c499-4f0a-4280-af7e-d2ab4b6c74dd/home/bowen/quchun/GroundingDINO/grounded_qunchun_sim.onnx --saveEngine=result.engine --fp16
[01/03/2024-23:26:06] [I] === Model Options ===
[01/03/2024-23:26:06] [I] Format: ONNX
[01/03/2024-23:26:06] [I] Model: /media/bowen/6202c499-4f0a-4280-af7e-d2ab4b6c74dd/home/bowen/quchun/GroundingDINO/grounded_qunchun_sim.onnx
[01/03/2024-23:26:06] [I] Output:
[01/03/2024-23:26:06] [I] === Build Options ===
[01/03/2024-23:26:06] [I] Max batch: explicit batch
[01/03/2024-23:26:06] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[01/03/2024-23:26:06] [I] minTiming: 1
[01/03/2024-23:26:06] [I] avgTiming: 8
[01/03/2024-23:26:06] [I] Precision: FP32+FP16
[01/03/2024-23:26:06] [I] LayerPrecisions:
[01/03/2024-23:26:06] [I] Layer Device Types:
[01/03/2024-23:26:06] [I] Calibration:
[01/03/2024-23:26:06] [I] Refit: Disabled
[01/03/2024-23:26:06] [I] Version Compatible: Disabled
[01/03/2024-23:26:06] [I] TensorRT runtime: full
[01/03/2024-23:26:06] [I] Lean DLL Path:
[01/03/2024-23:26:06] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[01/03/2024-23:26:06] [I] Exclude Lean Runtime: Disabled
[01/03/2024-23:26:06] [I] Sparsity: Disabled
[01/03/2024-23:26:06] [I] Safe mode: Disabled
[01/03/2024-23:26:06] [I] DirectIO mode: Disabled
[01/03/2024-23:26:06] [I] Restricted mode: Disabled
[01/03/2024-23:26:06] [I] Skip inference: Disabled
[01/03/2024-23:26:06] [I] Save engine: result.engine
[01/03/2024-23:26:06] [I] Load engine:
[01/03/2024-23:26:06] [I] Profiling verbosity: 0
[01/03/2024-23:26:06] [I] Tactic sources: Using default tactic sources
[01/03/2024-23:26:06] [I] timingCacheMode: local
[01/03/2024-23:26:06] [I] timingCacheFile:
[01/03/2024-23:26:06] [I] Heuristic: Disabled
[01/03/2024-23:26:06] [I] Preview Features: Use default preview flags.
[01/03/2024-23:26:06] [I] MaxAuxStreams: -1
[01/03/2024-23:26:06] [I] BuilderOptimizationLevel: 3
[01/03/2024-23:26:06] [I] Input(s)s format: fp32:CHW
[01/03/2024-23:26:06] [I] Output(s)s format: fp32:CHW
[01/03/2024-23:26:06] [I] Input build shapes: model
[01/03/2024-23:26:06] [I] Input calibration shapes: model
[01/03/2024-23:26:06] [I] === System Options ===
[01/03/2024-23:26:06] [I] Device: 0
[01/03/2024-23:26:06] [I] DLACore:
[01/03/2024-23:26:06] [I] Plugins:
[01/03/2024-23:26:06] [I] setPluginsToSerialize:
[01/03/2024-23:26:06] [I] dynamicPlugins:
[01/03/2024-23:26:06] [I] ignoreParsedPluginLibs: 0
[01/03/2024-23:26:06] [I]
[01/03/2024-23:26:06] [I] === Inference Options ===
[01/03/2024-23:26:06] [I] Batch: Explicit
[01/03/2024-23:26:06] [I] Input inference shapes: model
[01/03/2024-23:26:06] [I] Iterations: 10
[01/03/2024-23:26:06] [I] Duration: 3s (+ 200ms warm up)
[01/03/2024-23:26:06] [I] Sleep time: 0ms
[01/03/2024-23:26:06] [I] Idle time: 0ms
[01/03/2024-23:26:06] [I] Inference Streams: 1
[01/03/2024-23:26:06] [I] ExposeDMA: Disabled
[01/03/2024-23:26:06] [I] Data transfers: Enabled
[01/03/2024-23:26:06] [I] Spin-wait: Disabled
[01/03/2024-23:26:06] [I] Multithreading: Disabled
[01/03/2024-23:26:06] [I] CUDA Graph: Disabled
[01/03/2024-23:26:06] [I] Separate profiling: Disabled
[01/03/2024-23:26:06] [I] Time Deserialize: Disabled
[01/03/2024-23:26:06] [I] Time Refit: Disabled
[01/03/2024-23:26:06] [I] NVTX verbosity: 0
[01/03/2024-23:26:06] [I] Persistent Cache Ratio: 0
[01/03/2024-23:26:06] [I] Inputs:
[01/03/2024-23:26:06] [I] === Reporting Options ===
[01/03/2024-23:26:06] [I] Verbose: Disabled
[01/03/2024-23:26:06] [I] Averages: 10 inferences
[01/03/2024-23:26:06] [I] Percentiles: 90,95,99
[01/03/2024-23:26:06] [I] Dump refittable layers:Disabled
[01/03/2024-23:26:06] [I] Dump output: Disabled
[01/03/2024-23:26:06] [I] Profile: Disabled
[01/03/2024-23:26:06] [I] Export timing to JSON file:
[01/03/2024-23:26:06] [I] Export output to JSON file:
[01/03/2024-23:26:06] [I] Export profile to JSON file:
[01/03/2024-23:26:06] [I]
[01/03/2024-23:26:06] [I] === Device Information ===
[01/03/2024-23:26:06] [I] Selected Device: NVIDIA GeForce RTX 3060
[01/03/2024-23:26:06] [I] Compute Capability: 8.6
[01/03/2024-23:26:06] [I] SMs: 28
[01/03/2024-23:26:06] [I] Device Global Memory: 12036 MiB
[01/03/2024-23:26:06] [I] Shared Memory per SM: 100 KiB
[01/03/2024-23:26:06] [I] Memory Bus Width: 192 bits (ECC disabled)
[01/03/2024-23:26:06] [I] Application Compute Clock Rate: 1.777 GHz
[01/03/2024-23:26:06] [I] Application Memory Clock Rate: 7.501 GHz
[01/03/2024-23:26:06] [I]
[01/03/2024-23:26:06] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[01/03/2024-23:26:06] [I]
[01/03/2024-23:26:06] [I] TensorRT version: 8.6.0
[01/03/2024-23:26:06] [I] Loading standard plugins
[01/03/2024-23:26:07] [I] [TRT] [MemUsageChange] Init CUDA: CPU +215, GPU +0, now: CPU 221, GPU 799 (MiB)
[01/03/2024-23:26:11] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1214, GPU +264, now: CPU 1509, GPU 1063 (MiB)
[01/03/2024-23:26:11] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
[01/03/2024-23:26:11] [I] Start parsing network model.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 703540943
[01/03/2024-23:26:12] [I] [TRT] ----------------------------------------------------------------
[01/03/2024-23:26:12] [I] [TRT] Input filename: /media/bowen/6202c499-4f0a-4280-af7e-d2ab4b6c74dd/home/bowen/quchun/GroundingDINO/grounded_qunchun_sim.onnx
[01/03/2024-23:26:12] [I] [TRT] ONNX IR version: 0.0.8
[01/03/2024-23:26:12] [I] [TRT] Opset version: 16
[01/03/2024-23:26:12] [I] [TRT] Producer name: pytorch
[01/03/2024-23:26:12] [I] [TRT] Producer version: 2.0.1
[01/03/2024-23:26:12] [I] [TRT] Domain:
[01/03/2024-23:26:12] [I] [TRT] Model version: 0
[01/03/2024-23:26:12] [I] [TRT] Doc string:
[01/03/2024-23:26:12] [I] [TRT] ----------------------------------------------------------------
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 703540943
[01/03/2024-23:26:12] [W] [TRT] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[01/03/2024-23:26:12] [W] [TRT] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
[01/03/2024-23:26:12] [I] Finished parsing network model. Parse time: 0.879312
[01/03/2024-23:26:12] [W] Dynamic dimensions required for input: img, but no shapes were provided. Automatically overriding shape to: 1x3x1x1
[01/03/2024-23:26:12] [W] Dynamic dimensions required for input: input_ids, but no shapes were provided. Automatically overriding shape to: 1x1
[01/03/2024-23:26:12] [W] Dynamic dimensions required for input: attention_mask, but no shapes were provided. Automatically overriding shape to: 1x1
[01/03/2024-23:26:12] [W] Dynamic dimensions required for input: position_ids, but no shapes were provided. Automatically overriding shape to: 1x1
[01/03/2024-23:26:12] [W] Dynamic dimensions required for input: token_type_ids, but no shapes were provided. Automatically overriding shape to: 1x1
[01/03/2024-23:26:12] [W] Dynamic dimensions required for input: text_token_mask, but no shapes were provided. Automatically overriding shape to: 1x1x1
[01/03/2024-23:26:12] [E] Error[4]: [graphShapeAnalyzer.cpp::analyzeShapes::2013] Error Code 4: Miscellaneous (IShuffleLayer /backbone/backbone.0/patch_embed/Reshape_1: reshape wildcard -1 has infinite number of solutions or no solution. Reshaping [1,96,0] to [-1,96,0,0].)
[01/03/2024-23:26:12] [E] Engine could not be created from network
[01/03/2024-23:26:12] [E] Building engine failed
[01/03/2024-23:26:12] [E] Failed to create engine from model or file.
[01/03/2024-23:26:12] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8600] # trtexec --onnx=/media/bowen/6202c499-4f0a-4280-af7e-d2ab4b6c74dd/home/bowen/quchun/GroundingDINO/grounded_qunchun_sim.onnx --saveEngine=result.engine --fp16
How do you measure TRT perf. could you please try trtexec? it's a binary come with TRT package for perf measurement, a typical usage would be like
trtexec --onnx=model.onnx --fp16 --int8
Unable to convert to trt using the trtexec tool.
[01/03/2024-23:26:12] [W] Dynamic dimensions required for input: img, but no shapes were provided. Automatically overriding shape to: 1x3x1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: input_ids, but no shapes were provided. Automatically overriding shape to: 1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: attention_mask, but no shapes were provided. Automatically overriding shape to: 1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: position_ids, but no shapes were provided. Automatically overriding shape to: 1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: token_type_ids, but no shapes were provided. Automatically overriding shape to: 1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: text_token_mask, but no shapes were provided. Automatically overriding shape to: 1x1x1 [01/03/2024-23:26:12] [E] Error[4]: [graphShapeAnalyzer.cpp::analyzeShapes::2013] Error Code 4: Miscellaneous (IShuffleLayer /backbone/backbone.0/patch_embed/Reshape_1: reshape wildcard -1 has infinite number of solutions or no solution. Reshaping [1,96,0] to [-1,96,0,0].)
You need to set a compatible input shapes, check trtexec -h and check --optShapes
The engine file has been obtained, but during inference, the engine output cannot be aligned with the output of the pytorch model。I can provide engine files and inference scripts.
---Original--- From: "Zero @.> Date: Mon, Jan 8, 2024 09:40 AM To: @.>; Cc: @.@.>; Subject: Re: [NVIDIA/TensorRT] about GroundingDINO tensorrt acceleration questions? (Issue #3580)
[01/03/2024-23:26:12] [W] Dynamic dimensions required for input: img, but no shapes were provided. Automatically overriding shape to: 1x3x1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: input_ids, but no shapes were provided. Automatically overriding shape to: 1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: attention_mask, but no shapes were provided. Automatically overriding shape to: 1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: position_ids, but no shapes were provided. Automatically overriding shape to: 1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: token_type_ids, but no shapes were provided. Automatically overriding shape to: 1x1 [01/03/2024-23:26:12] [W] Dynamic dimensions required for input: text_token_mask, but no shapes were provided. Automatically overriding shape to: 1x1x1 [01/03/2024-23:26:12] [E] Error[4]: [graphShapeAnalyzer.cpp::analyzeShapes::2013] Error Code 4: Miscellaneous (IShuffleLayer /backbone/backbone.0/patch_embed/Reshape_1: reshape wildcard -1 has infinite number of solutions or no solution. Reshaping [1,96,0] to [-1,96,0,0].)
You need to set a compatible input shapes, check trtexec -h and check --optShapes
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
trtexec -h
During the process of converting onnx to tensortt, I added the -- optshape parameter, which consists of a total of 6 inputs. The following are 6 parameters that are dynamically inputted during the process of converting pytorch to onnx.
The command I converted using the trtexec tool is as follows:
./trtexec --onnx=/home/liufurui/TensorRT-8.6.1.6/bin/grounded_v3_sim.onnx --saveEngine=resultv2_fp16.engine --optShapes=img:1x3x800x1200,text_token_mask:1x6x6,token_type_ids:1x6,position_ids:1x6,input_ids:1x6,attention_mask:1x6 --workspace=10000 --fp16
Finally, it was successfully converted into an engine file, but when inferring the engine, the accuracy still cannot be aligned.
Do you have any other suggestions? Thank you, author.
Finally, it was successfully converted into an engine file, but when inferring the engine, the accuracy still cannot be aligned.
It kind of like caused by pre-processing or post-processing.
Finally, it was successfully converted into an engine file, but when inferring the engine, the accuracy still cannot be aligned.
It kind of like caused by pre-processing or post-processing.
It is possible that during the onnx2tensorrt process, tensorrt reduces the accuracy of some operators to improve speed.
not likely, you can comfirm this with polygraphy, usage like polygraphy run model.onnx --trt --onnxrt
closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks all!