Cody Yu

Results 161 comments of Cody Yu

Not really. The PR you pointed out only uses FP6/8 checkpoints. The compute is still in FP16.

Thanks for the clarification. Then we can close this issue I suppose?

> Why was it not covered by existing tests? I guess existing tests directly initiated the worker, but this is more like an end-to-end path starting from a higher level?

@cadedaniel this should be able to merge.

The fix PR is here: #4672 Meanwhile, @cadedaniel adjusted the test config to workaround this issue in #4592, so we should be good after merging this one.

The current speculative decoding performance isn't good enough due to the lack of bonus token, so the benchmark result may be less meaningful. We should perform another round of benchmarking...

It's discussed in #4212 and should be the latter case you mentioned. On the other hand, it does affect the performance a lot in certain cases. First, without a bonus...

Updated on reject sampler (cc @cadedaniel for reviewing these API changes of reject sampler): * Enable bonus token for PLD. * Disable strict_mode using environment variable `VLLM_DISABLE_REJECT_SAMPLING_STRICT_MODE`. With these updates,...

> tests failing unfortunately The failed test seems flaky. I found the same 2 spec decode tests failed in the main branch as well. Should retry all failed tests.

Thanks for fixing that! Could you help retry the failed tests again?