TensorRT-LLM qwen1.5-7b---Why do I need 37GB of GPU memory

qwen1.5-7b---Why do I need 37GB of GPU memory

Open xiangxinhello opened this issue 7 months ago • 1 comments

System Info

nvidia A100 PCIE 40g TensorRT-LLM version: 0.12.0.dev2024070200 python convert_checkpoint.py --model_dir ./tmp/Qwen/7B/ --output_dir ./tllm_checkpoint_1gpu_fp16 --dtype float16

trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu_fp16
--output_dir ./tmp/qwen/7B/trt_engines/fp16/1-gpu
--gemm_plugin float16 --max_batch_size 1
--max_input_len 1 --max_seq_len 3 --max_num_tokens 1

When I run 0.12.0 tensorrt-llm-qwen1.5-7b, it requires 37GB of GPU memory. nvidia A100 PCIE 40g N/A 32C P0 40W / 250W | 37957MiB / 40960MiB |

Who can help?

No response

Information

[x] The official example scripts
[ ] My own modified scripts

Tasks

[x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

1.Excessive GPU memory，0.12.0 trt-llm 37g GPU memory 2.0.8.0 trt-llm qwen1.5-7b only 17g GPU memory

Expected behavior

0.12.0 trtllm qwen1.5-7b has 37g GPU memory，I want this version to reduce GPU memory

actual behavior

0.12.0 trtllm qwen1.5-7b has 37g GPU memory，I want this version to reduce GPU memory

additional notes

0.12.0 trtllm qwen1.5-7b has 37g GPU memory，I want this version to reduce GPU memory

Jul 26 '24 10:07 xiangxinhello

TensorRT-LLM TensorRT-LLM copied to clipboard

qwen1.5-7b---Why do I need 37GB of GPU memory

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

TensorRT-LLM
TensorRT-LLM copied to clipboard