jiapingW comments

Results 28 comments of


                                            jiapingW

AttributeError: 'Tensor' object has no attribute 'mask_mod'

You need to modify some parameters of forward. In the speculative sampling of gpt-fast, the forward parameters of Transformer and model_forward are not aligned. You have to modify them as...

feat: Support Spec V2 + Constrained Decoding

Thanks, I test your impl use Llama3.1-8b-Instruct and [eagle model](https://huggingface.co/yuhuili/EAGLE-LLaMA3.1-Instruct-8B). When set ``export SGLANG_ENABLE_SPEC_V2=0``, the response satisfies ``r"^user@example\.com$"`` . When set ``export SGLANG_ENABLE_SPEC_V2=1``, the response is ``use the following information...

feat: Support Spec V2 + Constrained Decoding

> > Thanks, I test your impl use Llama3.1-8b-Instruct and [eagle model](https://huggingface.co/yuhuili/EAGLE-LLaMA3.1-Instruct-8B). When set `export SGLANG_ENABLE_SPEC_V2=0`, the response satisfies `r"^user@example\.com$"` . When set `export SGLANG_ENABLE_SPEC_V2=1`, the response is `use the...

feat: Support Spec V2 + Constrained Decoding

My main finding is that because of the spec v2 overlap, the grammar is not updated immediately after prefilling, but only after the first decode. This results in the grammar...

feat: Support Spec V2 + Constrained Decoding

I impl a runable version with no polish in https://github.com/sgl-project/sglang/pull/13441/files. However, I haven't conducted any testing or performance analysis yet. I test use the following code. Its result is OK....

feat: Support Spec V2 + Constrained Decoding

### My design is as follows: Take `question: "Generate an email address:"` and `grammar: "^user@example\.com$"` as an example. The original Spec V2's overlap design handles the process as follows: 1....

feat: Support Spec V2 + Constrained Decoding

@Ubospica Can help review the impl?

[Question] Why d2t = [target_token_ids] - torch.arange(len)?

Not storing the mapping directly requires less space; it only requires O(target_vocab_size) + O(draft_vocab_size). Furthermore, these operations can be performed directly on tensors, resulting in high computational efficiency. > [SpecForge/specforge/data/preprocessing.py](https://github.com/sgl-project/SpecForge/blob/d3472dde5d6828e60e7ee766ded74754e5dc6778/specforge/data/preprocessing.py#L588)...

[Question] Why d2t = [target_token_ids] - torch.arange(len)?

I understand what you mean. I don't think the -i operation here has any impact, but if you modify it, you'll also need to modify other places that call it,...

[Bug] Eagle3 training for gpt-oss-120b fails with OOM

Maybe you can add parameter in scripts/train_eagle3.py. ```python --attention-backend fa3 ```