nano-vllm icon indicating copy to clipboard operation
nano-vllm copied to clipboard

CUDA_Graph not match block table when set enforce_eager=False

Open Guozhenyuan opened this issue 2 months ago • 0 comments

I get the following problems.

Generating: 88%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 44/50 [02:56<01:30, 15.06s/it, Prefill=5681tok/s, Decode=590tok/s][rank0]: Traceback (most recent call last): [rank0]: File "/zju_wck/gzy/nano-vllm/vllm_infer.py", line 123, in [rank0]: results = llm.generate(texts, sampling_params) [rank0]: File "/zju_wck/gzy/nano-vllm/nanovllm/engine/llm_engine.py", line 76, in generate [rank0]: output, num_tokens = self.step() [rank0]: File "/zju_wck/gzy/nano-vllm/nanovllm/engine/llm_engine.py", line 51, in step [rank0]: token_ids = self.model_runner.call("run", seqs, is_prefill) [rank0]: File "/zju_wck/gzy/nano-vllm/nanovllm/engine/model_runner.py", line 97, in call [rank0]: return method(*args) [rank0]: File "/zju_wck/gzy/nano-vllm/nanovllm/engine/model_runner.py", line 225, in run [rank0]: logits = self.run_model(input_ids, positions, is_prefill) [rank0]: File "/root/anaconda3/envs/rlm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context [rank0]: return func(*args, **kwargs) [rank0]: File "/zju_wck/gzy/nano-vllm/nanovllm/engine/model_runner.py", line 216, in run_model [rank0]: graph_vars["block_tables"][:bs, :context.block_tables.size(1)] = context.block_tables [rank0]: File "/root/anaconda3/envs/rlm/lib/python3.10/site-packages/torch/utils/_device.py", line 104, in torch_function [rank0]: return func(*args, **kwargs) [rank0]: RuntimeError: The expanded size of the tensor (64) must match the existing size (65) at non-singleton dimension 1. Target sizes: [6, 64]. Tensor sizes: [6, 65] Generating: 88%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 44/50 [02:57<00:24, 4.03s/it, Prefill=5681tok/s, Decode=590tok/s]

Guozhenyuan avatar Sep 25 '25 14:09 Guozhenyuan