Memory leak while using tensor_parallel_size>1
Can you provide more details on what model are you using, and how many GPUs are you using? Any more details can be helpful. Thank you!
I'm running starcoder on 2*A10, The command is as follows: python -m vllm.entrypoints.api_server --model /model/starchat/starcoder-codewovb-wlmhead-mg2hf41 --tensor-parallel-size 2 --gpu-memory-utilization 0.90 --host 0.0.0.0 --port 8081 --max-num-batched-tokens 5120
same question when loading llama2 70b models on 4 gpus
same question when loading llama2 70b models on 2 gpus
Same issue w/ Mixtral 8x7B Instruct 0.1 (non-quantized)
we met the same issue with mistral7b: TP = 4 GPU = 4 * A10 vllm = 0.2.7
tensor_parallel_size also meet memory leak. TP=1, GPU = 1*A30 vllm = 0.3.3
tensor_parallel_size also meets memory leak. TP=2, GPU = 2*V100 vllm = 0.4.2
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
TP = 2, GPU=2 V100,llama3.1-8B-Instruct, vllm version = 0.6.2 . When I close the server it shows Warning as leaked memo.
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
This issue still exists
בתאריך יום ד׳, 12 בפבר׳ 2025 ב-3:59 מאת github-actions[bot] < @.***>:
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
— Reply to this email directly, view it on GitHub https://github.com/vllm-project/vllm/issues/694#issuecomment-2652472107, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOIX7FM4BVIYBWFRG4BW6ST2PKTJ7AVCNFSM6AAAAAA3HYOVJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJSGQ3TEMJQG4 . You are receiving this because you commented.Message ID: @.***> [image: github-actions[bot]]github-actions[bot] left a comment (vllm-project/vllm#694) https://github.com/vllm-project/vllm/issues/694#issuecomment-2652472107
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
— Reply to this email directly, view it on GitHub https://github.com/vllm-project/vllm/issues/694#issuecomment-2652472107, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOIX7FM4BVIYBWFRG4BW6ST2PKTJ7AVCNFSM6AAAAAA3HYOVJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJSGQ3TEMJQG4 . You are receiving this because you commented.Message ID: @.***>
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!