transformer-deploy icon indicating copy to clipboard operation
transformer-deploy copied to clipboard

Two GPU are slower than one

Open OleksandrKorovii opened this issue 2 years ago • 0 comments

Hi, I run Triton web server on two GPUs NVIDIA RTX3090Ti with --shm-size 20g. When I do inference, I get time near 1.56s. But if I run web server with only one GPU set --gpus '"device=0"' after that I get the time near 860ms. Length of input sequence was 256 tokens. I've optimized GPT2-medium by your script.

convert_model -m gpt2-medium \
    --backend tensorrt onnx \
    --seq-len 32 512 512 \
    --task text-generation --atol=2"

OleksandrKorovii avatar Dec 07 '22 16:12 OleksandrKorovii