Chanjun
Chanjun
what version you used? marlin? gemm? gemv? gemv_fast?
> > what version you used? marlin? gemm? gemv? gemv_fast? > > I use gemm thanks
can provide a determinism version?
i also meet this problems, and if the error happends on workers, it can make the service feel stuck sometimes hi @ruisearch42 , i think this error may be the...
hi @markluofd,Do you have any further findings?
It's strange, the shape of `self.intermediate_tensors` has become smaller. Could the CUDA graph be modifying it? @WoosukKwon hi, could you help?
@ruisearch42 hi, itest with 0.8.2, but in my env(8 standalone machines, 2 gpus each),the service will crash soon https://github.com/vllm-project/vllm/issues/15102#issuecomment-2764948930 I test with commit dc74613fa26b04e2664b41b3d3441136eb4534a6, would get this runtime error, even...
@ruisearch42 hi, in my latest tests, the runtime error still exists. and i found it may be ray bug with ray compiled graph. the worker 0 recv scheduler_output and intermediate_tensors...
> Hi [@MichoChan](https://github.com/MichoChan) , what is worker 0 and what is master 0? Can you share the whole command you used to launch vllm? > > Also your code format...
> [@MichoChan](https://github.com/MichoChan) are you trying TP=16, or TP=8 PP=2? > > 0 is not a proper value for `VLLM_USE_RAY_COMPILED_DAG_NCCL_CHANNEL`, so it's probably not taking effect. Can you remove this env...