RuntimeError: torch.cuda.MemPool doesn't currently support expandable_segments during vLLM model initialization

Open abhinav262666 opened this issue 5 months ago • 3 comments

Description

Summary

When training a LangGraph agent with openpipe-art[backend,langgraph], the process fails at model initialization with the following error:

RuntimeError: torch.cuda.MemPool doesn't currently support expandable_segments.

The error occurs inside vLLM when allocating CUDA parameters during model initialization.

Environment

OS: Linux
GPUs: 2x NVIDIA L4 (23 GB each)
CUDA: 12.4 (nvcc --version shows Cuda compilation tools, release 12.4, V12.4.131)
NVIDIA driver: 550.90.07
Python: 3.12.x (venv with uv)
Installed via: pip install openpipe-art[backend,langgraph]
Dependency versions (from uv.lock):
- torch==2.7.1
- vllm==0.10.0

Steps to reproduce

Create a new Python 3.12 virtual environment.
uv add openpipe-art[backend,langgraph]>=0.4.11
Run training (which calls art.model.register()).
Observe the crash at model initialization.

Logs

File ".../vllm/model_executor/layers/vocab_parallel_embedding.py", line 34, in init weight = Parameter(torch.empty(sum(output_partition_sizes), ...)) RuntimeError: torch.cuda.MemPool doesn't currently support expandable_segments.

Request

Please confirm if the current pinned torch (2.7.1) + vllm (0.10.0) combination is expected to work with CUDA 12.4 / L4 GPUs.
If not, could you provide a tested torch/vllm/xformers pinset for CUDA 12.4?
Alternatively, handle this error in vLLM (or document required versions) so users don’t hit this blocker.

Happy to provide full logs (pip freeze, nvcc, etc.) if needed.

Sep 16 '25 15:09 abhinav262666