Cade Daniel

Results 121 comments of Cade Daniel

> @cadedaniel really awesome series of changes! I assume the answer is no, but does the draft model also have it's own KV cache? If yes, where is it created...

Can you help me understand the problem better @youkaichao ? I want to understand if it's something we can solve with deltas, plus moving the on-device fields to worker state...

OK. @ruisearch42 will collect numbers and report here.

Oh I just saw your response > Yeah I thought of adding such an e2e test but I could find an easy way to access the metrics_collector and the stats....

Could we add a test to this PR?

I think one could mock the output of the model to be an invalid token wrt the grammar.