Cade Daniel
Cade Daniel
Ready for review cc @LiuXiaoxuanPKU
> @cadedaniel really awesome series of changes! I assume the answer is no, but does the draft model also have it's own KV cache? If yes, where is it created...
Can you help me understand the problem better @youkaichao ? I want to understand if it's something we can solve with deltas, plus moving the on-device fields to worker state...
OK. @ruisearch42 will collect numbers and report here.
LMK once it's ready for review @sroy745
Awesome. Will take a look tomorrow.
Oh I just saw your response > Yeah I thought of adding such an e2e test but I could find an easy way to access the metrics_collector and the stats....
Merged!
Could we add a test to this PR?
I think one could mock the output of the model to be an invalid token wrt the grammar.