Cody Yu comments

Results 161 comments of


                                            Cody Yu

[Feature]: FP6

Not really. The PR you pointed out only uses FP6/8 checkpoints. The compute is still in FP16.

[Feature]: FP6

Thanks for the clarification. Then we can close this issue I suppose?

[Bug fix][Core] fixup ngram not setup correctly

> Why was it not covered by existing tests? I guess existing tests directly initiated the worker, but this is more like an end-to-end path starting from a higher level?

[Bug fix][Core] fixup ngram not setup correctly

@cadedaniel this should be able to merge.

[Bug fix][Core] fixup ngram not setup correctly

The fix PR is here: #4672 Meanwhile, @cadedaniel adjusted the test config to workaround this issue in #4592, so we should be good after merging this one.

[Dynamic Spec Decoding] Auto-disable by the running queue size

The current speculative decoding performance isn't good enough due to the lack of bonus token, so the benchmark result may be less meaningful. We should perform another round of benchmarking...

[Dynamic Spec Decoding] Auto-disable by the running queue size

It's discussed in #4212 and should be the latter case you mentioned. On the other hand, it does affect the performance a lot in certain cases. First, without a bonus...

[Dynamic Spec Decoding] Auto-disable by the running queue size

Updated on reject sampler (cc @cadedaniel for reviewing these API changes of reject sampler): * Enable bonus token for PLD. * Disable strict_mode using environment variable `VLLM_DISABLE_REJECT_SAMPLING_STRICT_MODE`. With these updates,...

[Dynamic Spec Decoding] Auto-disable by the running queue size

> tests failing unfortunately The failed test seems flaky. I found the same 2 spec decode tests failed in the main branch as well. Should retry all failed tests.

[Dynamic Spec Decoding] Auto-disable by the running queue size

Thanks for fixing that! Could you help retry the failed tests again?