Chanjun

http://blog.csdn.net/aq14aq1 [email protected]

PDD Shanghai，China focus on service

Results 28 comments of


                                            Chanjun

[Bug]: RuntimeError: The size of tensor a (1059) must match the size of tensor b (376) at non-singleton dimension, DeepSeek R1 H20x16 pp2, v1 engine

> > [@ruisearch42](https://github.com/ruisearch42) VLLM_USE_RAY_COMPILED_DAG_NCCL_CHANNEL i remove this , the same runtime error > > How about if you only have `VLLM_USE_RAY_COMPILED_DAG_NCCL_CHANNEL=0` and don't have `--disable-custom-all-reduce` (i.e., using custom allreduce)? Do...

[Bug]: RuntimeError: The size of tensor a (1059) must match the size of tensor b (376) at non-singleton dimension, DeepSeek R1 H20x16 pp2, v1 engine

> [@MichoChan](https://github.com/MichoChan) , I suggest you use vLLM 0.8.3 and set `VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE` to `shm`. `VLLM_USE_RAY_COMPILED_DAG_NCCL_CHANNEL=0` doesn't actually turn off NCCL with ray 2.42+ since there is an API update. >...

[Bug]: RuntimeError: The size of tensor a (1059) must match the size of tensor b (376) at non-singleton dimension, DeepSeek R1 H20x16 pp2, v1 engine

> Hi [@MichoChan](https://github.com/MichoChan) , I tested with the following: > > ``` > VLLM_USE_V1=1 python3 -m vllm.entrypoints.openai.api_server --model deepseek-ai/DeepSeek-V3 --tensor-parallel-size 8 --pipeline-parallel-size 2 --gpu-memory-utilization 0.92 --dtype auto --served-model-name deepseekv3 --max-num-seqs...

[Bug]: RuntimeError: The size of tensor a (1059) must match the size of tensor b (376) at non-singleton dimension, DeepSeek R1 H20x16 pp2, v1 engine

> Hi [@MichoChan](https://github.com/MichoChan) , I was referring to this retry: https://github.com/ray-project/ray/blob/8e0bc7093fc3f71795147a32146bbcc8b2f393f2/python/ray/experimental/channel/common.py#L401 > > > and i test with pp=8, tp=2, using 8 machines, 2 GPUs per machine > > Yes....

after torch compile with 0.2.0, speed is become very slow

i test, but that hotfix can break graph, when compile using fullgraph, and if not use fullgraph, the compile will succees，and the speed is ok，but i need use fullgraph when...

after torch compile with 0.2.0, speed is become very slow

> @MichoChan how do you use vllm with torch.compile? torch compile in vllm is ok, but when i use vllm compilation impl in my framework，my model code would lead to...

after torch compile with 0.2.0, speed is become very slow

> @MichoChan how do you use vllm with torch.compile? > > so flashinfer 0.20.0 can't use torch compile full graph > > Can you explain this? I don't see why...

[Feature] Faster torch.compile

this https://github.com/vllm-project/vllm/pull/11108/files may work as expected

‹
1
2
3