vllm
vllm copied to clipboard
RuntimeError on ROCm
Example of command:
python benchmark_throughput.py --model gpt2 --input-len 256 --output-len 256
Output:
INFO 01-24 14:52:52 llm_engine.py:72] Initializing an LLM engine with config: model='gpt2', tokenizer='gpt2', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=1024, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, enforce_eager=False, seed=0)
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.1.1+cu121 with CUDA 1201 (you have 2.3.0.dev20240123+rocm5.7)
Python 3.10.13 (you have 3.10.13)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
INFO 01-24 14:52:55 weight_utils.py:164] Using model weights format ['*.safetensors']
Traceback (most recent call last):
File "/scratch/project_465000670/danish-foundation-models/scripts/lumi/eval/benchmark_throughput.py", line 318, in <module>
main(args)
File "/scratch/project_465000670/danish-foundation-models/scripts/lumi/eval/benchmark_throughput.py", line 205, in main
elapsed_time = run_vllm(requests, args.model, args.tokenizer,
File "/scratch/project_465000670/danish-foundation-models/scripts/lumi/eval/benchmark_throughput.py", line 76, in run_vllm
llm = LLM(
File "/scratch/project_465000670/danish-foundation-models/scripts/lumi/eval/.venv/lib/python3.10/site-packages/vllm-0.2.7+rocm573-py3.10-linux-x86_64.egg/vllm/entrypoints/llm.py", line 106, in __init__
self.llm_engine = LLMEngine.from_engine_args(engine_args)
File "/scratch/project_465000670/danish-foundation-models/scripts/lumi/eval/.venv/lib/python3.10/site-packages/vllm-0.2.7+rocm573-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 350, in from_engine_args
engine = cls(*engine_configs,
File "/scratch/project_465000670/danish-foundation-models/scripts/lumi/eval/.venv/lib/python3.10/site-packages/vllm-0.2.7+rocm573-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 112, in __init__
self._init_cache()
File "/scratch/project_465000670/danish-foundation-models/scripts/lumi/eval/.venv/lib/python3.10/site-packages/vllm-0.2.7+rocm573-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 303, in _init_cache
num_blocks = self._run_workers(
File "/scratch/project_465000670/danish-foundation-models/scripts/lumi/eval/.venv/lib/python3.10/site-packages/vllm-0.2.7+rocm573-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 977, in _run_workers
driver_worker_output = getattr(self.driver_worker,
File "/scratch/project_465000670/danish-foundation-models/scripts/lumi/eval/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/scratch/project_465000670/danish-foundation-models/scripts/lumi/eval/.venv/lib/python3.10/site-packages/vllm-0.2.7+rocm573-py3.10-linux-x86_64.egg/vllm/worker/worker.py", line 116, in profile_num_available_blocks
free_gpu_memory, total_gpu_memory = torch.cuda.mem_get_info()
File "/scratch/project_465000670/danish-foundation-models/scripts/lumi/eval/.venv/lib/python3.10/site-packages/torch/cuda/memory.py", line 655, in mem_get_info
return torch.cuda.cudart().cudaMemGetInfo(device)
RuntimeError: HIP error: invalid argument
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
Installed packages:
accelerate 0.26.1
aiohttp 3.9.1
aioprometheus 23.12.0
aiosignal 1.3.1
annotated-types 0.6.0
anyio 4.2.0
async-timeout 4.0.3
attrs 23.2.0
bert-score 0.3.13
bitsandbytes 0.42.0
certifi 2022.12.7
charset-normalizer 2.1.1
chex 0.1.85
click 8.1.7
cmake 3.28.1
contourpy 1.2.0
cycler 0.12.1
datasets 2.16.1
demjson3 3.0.6
dill 0.3.7
einops 0.7.0
etils 1.6.0
evaluate 0.4.1
exceptiongroup 1.2.0
fastapi 0.109.0
filelock 3.9.0
flash-attn 2.0.4
flax 0.8.0
fonttools 4.47.2
frozenlist 1.4.1
fsspec 2023.10.0
h11 0.14.0
httptools 0.6.1
huggingface-hub 0.20.3
idna 3.4
importlib-resources 6.1.1
interegular 0.3.3
jax 0.4.23
jaxlib 0.4.23
Jinja2 3.1.2
joblib 1.3.2
jsonschema 4.21.1
jsonschema-specifications 2023.12.1
kiwisolver 1.4.5
Levenshtein 0.23.0
lm-format-enforcer 0.8.2
markdown-it-py 3.0.0
MarkupSafe 2.1.3
matplotlib 3.8.2
mdurl 0.1.2
ml-dtypes 0.3.2
mpmath 1.2.1
msgpack 1.0.7
multidict 6.0.4
multiprocess 0.70.15
nest-asyncio 1.6.0
networkx 3.0rc1
ninja 1.11.1.1
nltk 3.8.1
numpy 1.26.3
openai 0.28.1
opt-einsum 3.3.0
optax 0.1.8
orbax-checkpoint 0.5.1
orjson 3.9.12
packaging 23.2
pandas 1.5.3
Pillow 9.3.0
pip 23.3.2
protobuf 3.20.3
psutil 5.9.8
pyarrow 14.0.2
pyarrow-hotfix 0.6
pydantic 2.5.3
pydantic_core 2.14.6
Pygments 2.17.2
pyinfer 0.0.3
pyparsing 3.1.1
python-dateutil 2.8.2
python-dotenv 0.21.1
pytorch-triton-rocm 2.2.0+dafe145982
pytz 2023.3.post1
PyYAML 6.0.1
quantile-python 1.1
rapidfuzz 3.6.1
ray 2.9.1
referencing 0.32.1
regex 2023.12.25
requests 2.31.0
responses 0.18.0
rich 13.7.0
rouge_score 0.1.2
rpds-py 0.17.1
sacremoses 0.1.1
safetensors 0.4.1
scandeval 9.2.0
scikit-learn 1.4.0
scipy 1.12.0
sentencepiece 0.1.99
seqeval 1.2.2
setuptools 65.5.0
six 1.16.0
sniffio 1.3.0
starlette 0.35.1
sympy 1.11.1
tabulate 0.9.0
tensorstore 0.1.52
termcolor 2.4.0
threadpoolctl 3.2.0
tiktoken 0.5.2
tokenizers 0.15.1
toolz 0.12.1
torch 2.3.0.dev20240123+rocm5.7
torchaudio 2.2.0.dev20240123+rocm5.7
torchvision 0.18.0.dev20240123+rocm5.7
tqdm 4.66.1
transformers 4.37.0
typing_extensions 4.9.0
urllib3 1.26.13
uvicorn 0.27.0
uvloop 0.19.0
vllm 0.2.7+rocm573
watchfiles 0.21.0
websockets 12.0
xformers 0.0.23
xxhash 3.4.1
yarl 1.9.4
zipp 3.17.0
This is running in the rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1
container on a node with MI250X GPUs.