jiapingW
jiapingW
You need to modify some parameters of forward. In the speculative sampling of gpt-fast, the forward parameters of Transformer and model_forward are not aligned. You have to modify them as...
Thanks, I test your impl use Llama3.1-8b-Instruct and [eagle model](https://huggingface.co/yuhuili/EAGLE-LLaMA3.1-Instruct-8B). When set ``export SGLANG_ENABLE_SPEC_V2=0``, the response satisfies ``r"^user@example\.com$"`` . When set ``export SGLANG_ENABLE_SPEC_V2=1``, the response is ``use the following information...
> > Thanks, I test your impl use Llama3.1-8b-Instruct and [eagle model](https://huggingface.co/yuhuili/EAGLE-LLaMA3.1-Instruct-8B). When set `export SGLANG_ENABLE_SPEC_V2=0`, the response satisfies `r"^user@example\.com$"` . When set `export SGLANG_ENABLE_SPEC_V2=1`, the response is `use the...
My main finding is that because of the spec v2 overlap, the grammar is not updated immediately after prefilling, but only after the first decode. This results in the grammar...
I impl a runable version with no polish in https://github.com/sgl-project/sglang/pull/13441/files. However, I haven't conducted any testing or performance analysis yet. I test use the following code. Its result is OK....
### My design is as follows: Take `question: "Generate an email address:"` and `grammar: "^user@example\.com$"` as an example. The original Spec V2's overlap design handles the process as follows: 1....
@Ubospica Can help review the impl?
Not storing the mapping directly requires less space; it only requires O(target_vocab_size) + O(draft_vocab_size). Furthermore, these operations can be performed directly on tensors, resulting in high computational efficiency. > [SpecForge/specforge/data/preprocessing.py](https://github.com/sgl-project/SpecForge/blob/d3472dde5d6828e60e7ee766ded74754e5dc6778/specforge/data/preprocessing.py#L588)...
I understand what you mean. I don't think the -i operation here has any impact, but if you modify it, you'll also need to modify other places that call it,...
Maybe you can add parameter in scripts/train_eagle3.py. ```python --attention-backend fa3 ```