Mike Iovine
Mike Iovine
The fix was already merged into main. Someone evidently ran into this bug and fixed it before the nvbug ever ended up on my desk; it just took a week...
/bot run --disable-fail-fast
/bot skip --comment "Confirmed that both of the newly added chunked prefill tests pass. There are no other code changes in this PR"
Adding in a comment from @lfr-0531 from the MR to the old repo: > What do you think if we add a new spec_executor.py to speculative/? Then we can add...
I have decided to allow the usage of the EAGLE3 checkpoints provided by the original paper authors on HuggingFace: https://huggingface.co/yuhuili/EAGLE3-DeepSeek-R1-Distill-LLaMA-8B We will need a few hacks on our side to...
I have decided to separate the `KVCacheManager`s for the target and draft models. This has the following advantages: 1. Avoids `_LAYER_INDEX_OFFSET` hack when creating the models 2. Lets us support...