ART icon indicating copy to clipboard operation
ART copied to clipboard

Cuda out of Memory

Open ck-amrahd opened this issue 4 months ago • 1 comments

Hi guys, I am trying to run Qwen 3B Instruct model on a GPU with 24 GB VRAM, but when the VLLM is creating CUDA graphs, it goes out of memory. It seems like we can set the gpu_memory_utilization config of VLLM to be around 0.7 to free up the GPU memory. Is there a way to pass this flag during backend initialization? Another interesting issue is that this happens when I run it on Databricks with a GPU of 24 GB VRAM, but when I run it on a local machine with RTX 3090, it runs fine. Not sure what the cause is. Thank you.

ck-amrahd avatar Nov 09 '25 16:11 ck-amrahd

For anyone interested here's the solution:

model = art.TrainableModel(
    name="agent-001",
    project="my-agentic-task",
    base_model="unsloth/Qwen2.5-3B-Instruct-unsloth-bnb-4bit",
    _internal_config=art.dev.InternalModelConfig(
        engine_args=art.dev.EngineArgs(
            gpu_memory_utilization=0.68,
            enforce_eager=True
        )
    ),
)

ckamrahd avatar Nov 10 '25 21:11 ckamrahd