vllm icon indicating copy to clipboard operation
vllm copied to clipboard

unload the model

Open mira-roza opened this issue 1 year ago • 12 comments

Hi, i m sorry, i don't find how unload model. like i load a model, i delete the object and i call the garbage collector but it does nothing. How we are suppose to unload model? I want to load a model do a batch, load an other do a batch, like that for multiple models for comparing them. But for now i must stop python each time.

mira-roza avatar Mar 08 '24 13:03 mira-roza

Try calling torch.cuda.empty_cache() after you delete the LLM object

hmellor avatar Mar 09 '24 11:03 hmellor

You can also use gc.collect() to remove *garbage* objects immediately, after you delete them.

chenxu2048 avatar Mar 11 '24 05:03 chenxu2048

image both doesn't work.

mira-roza avatar Mar 11 '24 09:03 mira-roza

You should also clean Notebook output: https://stackoverflow.com/questions/24816237/ipython-notebook-clear-cell-output-in-code

chenxu2048 avatar Mar 11 '24 13:03 chenxu2048

i always do (In the GUI not in my cells)

mira-roza avatar Mar 11 '24 14:03 mira-roza

this seems mostly solved by #1908 with

import gc

import torch
from vllm import LLM, SamplingParams
from vllm.model_executor.parallel_utils.parallel_state import destroy_model_parallel

# Load the model via vLLM
llm = LLM(model=model_name, download_dir=saver_dir, tensor_parallel_size=num_gpus, gpu_memory_utilization=0.70)

# Delete the llm object and free the memory
destroy_model_parallel()
del llm.llm_engine.driver_worker
del llm
gc.collect()
torch.cuda.empty_cache()
torch.distributed.destroy_process_group()
print("Successfully delete the llm pipeline and free the GPU memory!")

mnoukhov avatar Mar 28 '24 15:03 mnoukhov

this seems mostly solved by #1908 with

import gc

import torch
from vllm import LLM, SamplingParams
from vllm.model_executor.parallel_utils.parallel_state import destroy_model_parallel

# Load the model via vLLM
llm = LLM(model=model_name, download_dir=saver_dir, tensor_parallel_size=num_gpus, gpu_memory_utilization=0.70)

# Delete the llm object and free the memory
destroy_model_parallel()
del llm.llm_engine.driver_worker
del llm
gc.collect()
torch.cuda.empty_cache()
torch.distributed.destroy_process_group()
print("Successfully delete the llm pipeline and free the GPU memory!")

i had already read that. My problem stay unsolved when i use the Vllm from llamaindex otherwise it almost works. I've a little of memory that stay used (~1GB) but at least i can load and unload the models. the problem is that i don't find how access to the member llm_engine of Vllm.LLM

mira-roza avatar Apr 02 '24 12:04 mira-roza

@chenxu2048 the notebook output is just computed data shown to the user, the Python kernel computes it but it's a one-way communication - the output doesn't affect the kernel at all. Therefore clearing the output will have no effect on GPU memory or any other state of the kernel.

vvolhejn avatar Oct 01 '24 12:10 vvolhejn

No resulute answer given. Can be model unload from gpu ram with vllm? Yes or no

david-koleckar avatar Oct 08 '24 19:10 david-koleckar

No resulute answer given. Can be model unload from gpu ram with vllm? Yes or no

Worst case scenario: use notebook magic %write to write the script in a python file, and run the python file within the notebook. When vllm finish run the memory will be recollected

powerLEO101 avatar Oct 31 '24 23:10 powerLEO101

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

github-actions[bot] avatar Feb 14 '25 01:02 github-actions[bot]

can this be done by using a http api or a simple timeout for no requests?

ghost avatar Mar 03 '25 15:03 ghost

This just never seems to work - I'm sometimes not even able to terminate the main process if I need to as the memory is never cleaned.

AetherPrior avatar Apr 25 '25 00:04 AetherPrior

can this be done by using a http api or a simple timeout for no requests?

+1 for an API.

mpetruc avatar Jul 07 '25 22:07 mpetruc

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

github-actions[bot] avatar Oct 06 '25 02:10 github-actions[bot]

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

github-actions[bot] avatar Nov 05 '25 02:11 github-actions[bot]