Robert Shaw comments

Results 377 comments of


                                            Robert Shaw

[ Misc ] Expand Fp8 MoE Support to Qwen

Closed because this functionality was completed in https://github.com/vllm-project/vllm/commit/fb6af8bc086328ca6659e72d11ffd4309ce4de22

[ BugFix ] Prompt Logprobs Detokenization

> Can you try some of the test cases in #5846, #5872, w/ and w/o chunked prefill ? > > Additionally, you should be able mark #4904, #4772, #5334, #5872...

[ BugFix ] Prompt Logprobs Detokenization

> sampler test broke Yup, its due to chunked_prefill. Im fixing it. @simon-mo @njhill @Yard1 will need re-review

[ BugFix ] Prompt Logprobs Detokenization

> Can we have a regression test? Also I have impression the current fix won't work with chunked prefill (mainly because second chunk won't have None for the first prompt...

[ BugFix ] Prompt Logprobs Detokenization

Okay, chunked prefill needed more fixes than I expected. I had to back my changes out the sampler because it required poking around too much in the sequence_data to detect...

Add `vllm_v1`

👀

[Bug]: Qwen2 Moe FP8 not supported on L40

fp8 not yet supported for Qwen. WIP PR: https://github.com/vllm-project/vllm/pull/6088

[Bug]: Qwen2 Moe FP8 not supported on L40

Fp8 is now supported for Qwen, but MoE Fp8 requires compute_capability == 9.0 (aka Hopper GPUs) Our MoE kernels are currently implemented using Triton, which require triton==3.0 for Fp8 on...

[Bug]: v0.5.5 crash: "AssertionError: expected running sequences"

The source of `AssertionError: expected running sequences` is due to `abort` not yet being supported with `multi-step` scheduling. `multi-step` scheduling is a new feature we are still working on -...

[Bug]: v0.5.5 crash: "AssertionError: expected running sequences"

> Two minutes later the next error: > > ``` > │ return self._call_impl(*args, **kwargs) │ > │ File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl │ > │ return forward_call(*args, **kwargs)...