Cody Yu comments

Results 161 comments of


                                            Cody Yu

[Bug]: Multistep with n>1 Fails

Sorry we're busying with the company event (Ray Summit) until this week. Will try to find some time after the event to look into it. @SolitaryThinker could you also take...

[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel

The CI is already triggered.

[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel

> @comaniac how can I trigger the CI? I have no dev env for vllm currently Does that mean you cannot verify this PR locally? We should avoid using CI...

[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel

> > > Can not work on NVIDIA Ampere GPU, for example 3090. > > > > > > Unfortunate limit of Triton > > Does [#5975](https://github.com/vllm-project/vllm/pull/5975) help for this?...

[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel

btw did you test on H100?

[ Misc ] Expand Fp8 MoE Support to Qwen

@robertgshaw2-neuralmagic we are also suffering from the illegal memory access even before this refactoring. It's weird because I didn't find this issue at v0.5.0 and it's still working for me...

[ Misc ] Expand Fp8 MoE Support to Qwen

> @robertgshaw2-neuralmagic @comaniac There is a potential risk of illegal memory access, I have made changes but have not yet submitted them. Please refer to:[add_device_gurad](https://github.com/jeejeelee/vllm/blob/fix-moe-kernel/csrc/moe_align_block_size_kernels.cu#L115) Interesting. Do you think the...

Cody Yu

[Bug]: Multistep with n>1 Fails

[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel

[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel

[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel

[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel

[ Misc ] Expand Fp8 MoE Support to Qwen

[ Misc ] Expand Fp8 MoE Support to Qwen

[ Misc ] Expand Fp8 MoE Support to Qwen

[V1][Bug]: TP with Ray does not terminate gracefully

[CI] Add mteb testing to test the accuracy of the embedding model