ART
ART copied to clipboard
Cuda out of Memory
Hi guys, I am trying to run Qwen 3B Instruct model on a GPU with 24 GB VRAM, but when the VLLM is creating CUDA graphs, it goes out of memory. It seems like we can set the gpu_memory_utilization config of VLLM to be around 0.7 to free up the GPU memory. Is there a way to pass this flag during backend initialization? Another interesting issue is that this happens when I run it on Databricks with a GPU of 24 GB VRAM, but when I run it on a local machine with RTX 3090, it runs fine. Not sure what the cause is. Thank you.
For anyone interested here's the solution:
model = art.TrainableModel(
name="agent-001",
project="my-agentic-task",
base_model="unsloth/Qwen2.5-3B-Instruct-unsloth-bnb-4bit",
_internal_config=art.dev.InternalModelConfig(
engine_args=art.dev.EngineArgs(
gpu_memory_utilization=0.68,
enforce_eager=True
)
),
)