TensorRT
TensorRT copied to clipboard
❓ [Question] a10 performance drop significantly
❓ Question
I converted the gfpgan model (https://github.com/TencentARC/GFPGAN) with torch_tensorrt, and I found torch_tensorrt is twice as fast as torch in 3070. But in one a10 server, torch_tensorrt and torch are closed; In other a10 server, torch_tensorrt is even twice as slow as torch. Statics shows below. (two type of a10 from two difference cloud server).
| GPU | CPU | CPU core | CPU freq | memory | inference framework | CPU usage | memory usage | GPU usage | inference time |
|---|---|---|---|---|---|---|---|---|---|
| 3070 | AMD Ryzen 7 5800X 8-Core Processor | 16 | 2200-3800MHz | 32G | pytorch | 30-35% | 160-170% | 13.5g 987.7m | 33.889511s |
| 3070 | torch_tensorrt | 15-20% | 180-200% | 11.7g 1.1g | 16.259879s | ||||
| a10(v1) | Intel (R) Xeon (R) Platinum 8350C CPU @ 2.60GHz | 28 | 2593MHz | 112G | pytorch | 25-30% | 190-200% | 15.1g 1.2g | 33.933190s |
| a10(v1) | torch_tensorrt | 15-20% | 190-200% | 13.0g 1.2g | 31.899047s | ||||
| a10(v2) | Intel(R) Xeon(R) Platinum 8336C CPU @ 2.30GHz | 28 | 2300-4600MHz | 112G | pytorch | 20-30% | 180-200% | 15.1g 1.0g | 34.027398s |
| a10(v2) | torch_tensorrt | 10-15% | 160-170% | 13.1g 1.1g | 66.498723s |
I also tried torch2trt(https://github.com/NVIDIA-AI-IOT/torch2trt) and fixed some op error, finding it's twice as fast as torch_tensorrt in 3070. And performance didn't drop so strangely in a10 server.
Environment
Build information about Torch-TensorRT can be found by turning on debug messages
- PyTorch Version (e.g., 1.0): nvcr.io/nvidia/pytorch:23.08-py3
- CPU Architecture: as above
- OS (e.g., Linux): linux
- How you installed PyTorch (
conda,pip,libtorch, source): docker - Build command you used (if compiling from source):
- Are you using local sources or building from archives:
- Python version:
- CUDA version:
- GPU models and configuration: as above
- Any other relevant information:
Additional context
Hi - thank you for the report. I noticed these results are using the 23.08-py3 container, and there have been many upgrades and changes to Torch-TensorRT since the version which was packaged in that container. The latest version of Torch-TensorRT can be found in the hosted Docker containers, and would be worth trying for comparison against the above.
# For latest nightly version of Torch-TRT + Torch
docker pull ghcr.io/pytorch/tensorrt/torch_tensorrt:nightly
# For latest stable Torch + RC of Torch-TRT
docker pull ghcr.io/pytorch/tensorrt/torch_tensorrt:release_2.1
@gs-olive Thanks, I will have a try as soon as possible.