William Lin comments

Results 76 comments of


                                            William Lin

[Core] Pipeline Parallel Support

Hi @andoorve, While benchmarking using your PR, I've consistently encountered engine timeouts with smaller models on setups far below total VRAM capacity, which might relate to the issues you've linked...

[Bug]: NCCL get stuck when instantiating the LLM class.

I see you are using multi-step, so could also be related to https://github.com/vllm-project/vllm/pull/8403, now merged.

[Bug]: TimeoutError During Benchmark Profiling with Torch Profiler on vLLM v0.6.0

I forgot to handle the multiproc case. Will make a PR. For now set `--worker-use-ray` to use the ray backend and it should work.

[Bug]: TimeoutError During Benchmark Profiling with Torch Profiler on vLLM v0.6.0

For the timeout issue try setting the env var: `VLLM_RPC_GET_DATA_TIMEOUT_MS=1800000`

[Bug]: GPU Memory Utilization Lower Than Expected with --enable-prefix-caching

You could try increasing the max batch size with `--max-num-seqs`. By default it is 256 which may be too small for fp8 8B

[Bug]: vllm.engine.async_llm_engine.AsyncEngineDeadError

flashinfer+multi-step will be supported by this PR https://github.com/vllm-project/vllm/pull/7928

[Bug]: vllm.engine.async_llm_engine.AsyncEngineDeadError

The PR is merged now.

[Bug]: vLLM v0.6.1 Instability issue under load.

Let me try to reproduce on my end and take a look. Meanwhile, @ashgold, @br3no could you please trying `--disable-async-output-proc` and see if that changes anything?

[Bug]: vLLM v0.6.1 Instability issue under load.

Don't think I can reproduce, but probably not cause my multi-step as it's disabled by default

[Bug]: vLLM v0.6.1 Instability issue under load.

cc @alexm-neuralmagic @megha95