Cade Daniel comments

Results 121 comments of


                                            Cade Daniel

[Speculative decoding 4/9] Lookahead scheduling for speculative decoding

Ready for review cc @LiuXiaoxuanPKU

[Speculative decoding 4/9] Lookahead scheduling for speculative decoding

> @cadedaniel really awesome series of changes! I assume the answer is no, but does the draft model also have it's own KV cache? If yes, where is it created...

[wip][Core] Introduce SPMD worker execution using Ray accelerated DAG

Can you help me understand the problem better @youkaichao ? I want to understand if it's something we can solve with deltas, plus moving the on-device fields to worker state...

[wip][Core] Introduce SPMD worker execution using Ray accelerated DAG

OK. @ruisearch42 will collect numbers and report here.

[Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models

LMK once it's ready for review @sroy745

[Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models

Awesome. Will take a look tomorrow.

[Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models

Oh I just saw your response > Yeah I thought of adding such an e2e test but I could find an easy way to access the metrics_collector and the stats....

[Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models

Merged!

Fail request if FSM fails to advance

Could we add a test to this PR?

Fail request if FSM fails to advance

I think one could mock the output of the model to be an invalid token wrt the grammar.