Chanjun

Results 28 comments of Chanjun

> > [@ruisearch42](https://github.com/ruisearch42) VLLM_USE_RAY_COMPILED_DAG_NCCL_CHANNEL i remove this , the same runtime error > > How about if you only have `VLLM_USE_RAY_COMPILED_DAG_NCCL_CHANNEL=0` and don't have `--disable-custom-all-reduce` (i.e., using custom allreduce)? Do...

> [@MichoChan](https://github.com/MichoChan) , I suggest you use vLLM 0.8.3 and set `VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE` to `shm`. `VLLM_USE_RAY_COMPILED_DAG_NCCL_CHANNEL=0` doesn't actually turn off NCCL with ray 2.42+ since there is an API update. >...

> Hi [@MichoChan](https://github.com/MichoChan) , I tested with the following: > > ``` > VLLM_USE_V1=1 python3 -m vllm.entrypoints.openai.api_server --model deepseek-ai/DeepSeek-V3 --tensor-parallel-size 8 --pipeline-parallel-size 2 --gpu-memory-utilization 0.92 --dtype auto --served-model-name deepseekv3 --max-num-seqs...

> Hi [@MichoChan](https://github.com/MichoChan) , I was referring to this retry: https://github.com/ray-project/ray/blob/8e0bc7093fc3f71795147a32146bbcc8b2f393f2/python/ray/experimental/channel/common.py#L401 > > > and i test with pp=8, tp=2, using 8 machines, 2 GPUs per machine > > Yes....

i test, but that hotfix can break graph, when compile using fullgraph, and if not use fullgraph, the compile will succees,and the speed is ok,but i need use fullgraph when...

> @MichoChan how do you use vllm with torch.compile? torch compile in vllm is ok, but when i use vllm compilation impl in my framework,my model code would lead to...

> @MichoChan how do you use vllm with torch.compile? > > so flashinfer 0.20.0 can't use torch compile full graph > > Can you explain this? I don't see why...

this https://github.com/vllm-project/vllm/pull/11108/files may work as expected