Chanjun comments

Results 28 comments of


                                            Chanjun

Issues with quantizing Cohere model

what version you used? marlin? gemm? gemv? gemv_fast?

Issues with quantizing Cohere model

> > what version you used? marlin? gemm? gemv? gemv_fast? > > I use gemm thanks

slight nondeterminism

can provide a determinism version？

[Bug]: RuntimeError: The size of tensor a (1059) must match the size of tensor b (376) at non-singleton dimension, DeepSeek R1 H20x16 pp2, v1 engine

i also meet this problems, and if the error happends on workers, it can make the service feel stuck sometimes hi @ruisearch42 , i think this error may be the...

[Bug]: RuntimeError: The size of tensor a (1059) must match the size of tensor b (376) at non-singleton dimension, DeepSeek R1 H20x16 pp2, v1 engine

hi @markluofd，Do you have any further findings?

[Bug]: RuntimeError: The size of tensor a (1059) must match the size of tensor b (376) at non-singleton dimension, DeepSeek R1 H20x16 pp2, v1 engine

It's strange, the shape of `self.intermediate_tensors` has become smaller. Could the CUDA graph be modifying it? @WoosukKwon hi, could you help？

[Bug]: RuntimeError: The size of tensor a (1059) must match the size of tensor b (376) at non-singleton dimension, DeepSeek R1 H20x16 pp2, v1 engine

@ruisearch42 hi, itest with 0.8.2, but in my env(8 standalone machines, 2 gpus each)，the service will crash soon https://github.com/vllm-project/vllm/issues/15102#issuecomment-2764948930 I test with commit dc74613fa26b04e2664b41b3d3441136eb4534a6, would get this runtime error, even...

[Bug]: RuntimeError: The size of tensor a (1059) must match the size of tensor b (376) at non-singleton dimension, DeepSeek R1 H20x16 pp2, v1 engine

@ruisearch42 hi, in my latest tests, the runtime error still exists. and i found it may be ray bug with ray compiled graph. the worker 0 recv scheduler_output and intermediate_tensors...

[Bug]: RuntimeError: The size of tensor a (1059) must match the size of tensor b (376) at non-singleton dimension, DeepSeek R1 H20x16 pp2, v1 engine

> Hi [@MichoChan](https://github.com/MichoChan) , what is worker 0 and what is master 0? Can you share the whole command you used to launch vllm? > > Also your code format...

[Bug]: RuntimeError: The size of tensor a (1059) must match the size of tensor b (376) at non-singleton dimension, DeepSeek R1 H20x16 pp2, v1 engine

> [@MichoChan](https://github.com/MichoChan) are you trying TP=16, or TP=8 PP=2? > > 0 is not a proper value for `VLLM_USE_RAY_COMPILED_DAG_NCCL_CHANNEL`, so it's probably not taking effect. Can you remove this env...