TensorRT-LLM
TensorRT-LLM copied to clipboard
qwen1.5-7b---Why do I need 37GB of GPU memory
System Info
nvidia A100 PCIE 40g TensorRT-LLM version: 0.12.0.dev2024070200 python convert_checkpoint.py --model_dir ./tmp/Qwen/7B/ --output_dir ./tllm_checkpoint_1gpu_fp16 --dtype float16
trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu_fp16
--output_dir ./tmp/qwen/7B/trt_engines/fp16/1-gpu
--gemm_plugin float16
--max_batch_size 1
--max_input_len 1
--max_seq_len 3
--max_num_tokens 1
When I run 0.12.0 tensorrt-llm-qwen1.5-7b, it requires 37GB of GPU memory. nvidia A100 PCIE 40g N/A 32C P0 40W / 250W | 37957MiB / 40960MiB |
Who can help?
No response
Information
- [x] The official example scripts
- [ ] My own modified scripts
Tasks
- [x] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
1.Excessive GPU memory,0.12.0 trt-llm 37g GPU memory 2.0.8.0 trt-llm qwen1.5-7b only 17g GPU memory
Expected behavior
0.12.0 trtllm qwen1.5-7b has 37g GPU memory,I want this version to reduce GPU memory
actual behavior
0.12.0 trtllm qwen1.5-7b has 37g GPU memory,I want this version to reduce GPU memory
additional notes
0.12.0 trtllm qwen1.5-7b has 37g GPU memory,I want this version to reduce GPU memory