vllm Memory leak while using tensor_parallel

Aug 08 '23 02:08 haiasd

Can you provide more details on what model are you using, and how many GPUs are you using? Any more details can be helpful. Thank you!

Aug 15 '23 20:08 zhuohan123

I'm running starcoder on 2*A10, The command is as follows: python -m vllm.entrypoints.api_server --model /model/starchat/starcoder-codewovb-wlmhead-mg2hf41 --tensor-parallel-size 2 --gpu-memory-utilization 0.90 --host 0.0.0.0 --port 8081 --max-num-batched-tokens 5120

Aug 16 '23 05:08 haiasd

same question when loading llama2 70b models on 4 gpus

Nov 09 '23 13:11 wonderseen

same question when loading llama2 70b models on 2 gpus

Jan 18 '24 12:01 ChristineSeven

Same issue w/ Mixtral 8x7B Instruct 0.1 (non-quantized)

Feb 01 '24 02:02 wangcho2k

we met the same issue with mistral7b: TP = 4 GPU = 4 * A10 vllm = 0.2.7

Feb 02 '24 12:02 PeterWang1986

tensor_parallel_size also meet memory leak. TP=1, GPU = 1*A30 vllm = 0.3.3

May 24 '24 03:05 austingg

tensor_parallel_size also meets memory leak. TP=2, GPU = 2*V100 vllm = 0.4.2

Jun 06 '24 11:06 yarinlaniado

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

Oct 31 '24 02:10 github-actions[bot]

TP = 2, GPU=2 V100,llama3.1-8B-Instruct, vllm version = 0.6.2 . When I close the server it shows Warning as leaked memo.

Nov 11 '24 13:11 ekmekovski

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

Feb 12 '25 01:02 github-actions[bot]

This issue still exists

‫בתאריך יום ד׳, 12 בפבר׳ 2025 ב-3:59 מאת ‪github-actions[bot]‬‏ <‪ @.***‬‏>:‬

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

— Reply to this email directly, view it on GitHub https://github.com/vllm-project/vllm/issues/694#issuecomment-2652472107, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOIX7FM4BVIYBWFRG4BW6ST2PKTJ7AVCNFSM6AAAAAA3HYOVJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJSGQ3TEMJQG4 . You are receiving this because you commented.Message ID: @.***> [image: github-actions[bot]]github-actions[bot] left a comment (vllm-project/vllm#694) https://github.com/vllm-project/vllm/issues/694#issuecomment-2652472107

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

— Reply to this email directly, view it on GitHub https://github.com/vllm-project/vllm/issues/694#issuecomment-2652472107, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOIX7FM4BVIYBWFRG4BW6ST2PKTJ7AVCNFSM6AAAAAA3HYOVJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJSGQ3TEMJQG4 . You are receiving this because you commented.Message ID: @.***>

Feb 14 '25 16:02 yarinlaniado

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

May 16 '25 02:05 github-actions[bot]

Memory leak while using tensor_parallel_size>1