TensorRT ❓ [Question] a10 performance drop significantly

❓ Question

I converted the gfpgan model (https://github.com/TencentARC/GFPGAN) with torch_tensorrt, and I found torch_tensorrt is twice as fast as torch in 3070. But in one a10 server, torch_tensorrt and torch are closed; In other a10 server, torch_tensorrt is even twice as slow as torch. Statics shows below. （two type of a10 from two difference cloud server）.

GPU	CPU	CPU core	CPU freq	memory	inference framework	CPU usage	memory usage	GPU usage	inference time
3070	AMD Ryzen 7 5800X 8-Core Processor	16	2200-3800MHz	32G	pytorch	30-35%	160-170%	13.5g 987.7m	33.889511s
3070					torch_tensorrt	15-20%	180-200%	11.7g 1.1g	16.259879s
a10（v1）	Intel (R) Xeon (R) Platinum 8350C CPU @ 2.60GHz	28	2593MHz	112G	pytorch	25-30%	190-200%	15.1g 1.2g	33.933190s
a10（v1）					torch_tensorrt	15-20%	190-200%	13.0g 1.2g	31.899047s
a10（v2）	Intel(R) Xeon(R) Platinum 8336C CPU @ 2.30GHz	28	2300-4600MHz	112G	pytorch	20-30%	180-200%	15.1g 1.0g	34.027398s
a10（v2）					torch_tensorrt	10-15%	160-170%	13.1g 1.1g	66.498723s

I also tried torch2trt(https://github.com/NVIDIA-AI-IOT/torch2trt) and fixed some op error, finding it's twice as fast as torch_tensorrt in 3070. And performance didn't drop so strangely in a10 server.

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

PyTorch Version (e.g., 1.0): nvcr.io/nvidia/pytorch:23.08-py3
CPU Architecture: as above
OS (e.g., Linux): linux
How you installed PyTorch (conda, pip, libtorch, source): docker
Build command you used (if compiling from source):
Are you using local sources or building from archives:
Python version:
CUDA version:
GPU models and configuration: as above
Any other relevant information:

Additional context

Dec 25 '23 08:12 ArtemisZGL

Hi - thank you for the report. I noticed these results are using the 23.08-py3 container, and there have been many upgrades and changes to Torch-TensorRT since the version which was packaged in that container. The latest version of Torch-TensorRT can be found in the hosted Docker containers, and would be worth trying for comparison against the above.

# For latest nightly version of Torch-TRT + Torch
docker pull ghcr.io/pytorch/tensorrt/torch_tensorrt:nightly
# For latest stable Torch + RC of Torch-TRT
docker pull ghcr.io/pytorch/tensorrt/torch_tensorrt:release_2.1

Dec 27 '23 18:12 gs-olive

@gs-olive Thanks, I will have a try as soon as possible.

Jan 05 '24 02:01 ArtemisZGL

TensorRT TensorRT copied to clipboard

❓ [Question] a10 performance drop significantly

❓ Question

Environment

Additional context

TensorRT
TensorRT copied to clipboard