jiapingW comments

Results 28 comments of


                                            jiapingW

[Bug] Eagle3 training for gpt-oss-120b fails with OOM

Regarding the memory issue, you might try using the `--sglang-mem-fraction-static` parameter to reduce memory usage. Since I train very large models offline, these are the only suggestions I have.

[Bug] Eagle3 training for gpt-oss-120b fails with OOM

You can use the hf backend. > I meet the same error，but i want train qwen3 4b

[Feature]Add Parser for Qwen3 think model

I think the implementation of adding an ignore_token is concise and reasonable. That's great!

[Feature] Using Sglang For Online Training

Do you mean inheriting the sglang model and using it as the target model for inference to generate hidden states? Or do you mean if you implement Eagle3 (e.g., llama3)...

[Feature] TTT implementation is not the original paper’s concept, but merely data augmentation

My understanding of TTT (train-time-test) is that it's used to align inference and training. (See this paper: https://arxiv.org/abs/2408.15766.) It's not a form of data augmentation. Without TTT, the model doesn't...

[Feature] TTT implementation is not the original paper’s concept, but merely data augmentation

Your understanding is profound. I agree that Eagle3 training is a process of knowledge distillation. Whether the model training goal is to align logits or features, from a high-level perspective,...

[Feature] Does SpecForge support using quantized models as the target model?

I'm currently testing this feature, and if it doesn't work, I'll try to support it.

[Feature] Does SpecForge support using quantized models as the target model?

I test online_training use qwen2.5-7b-awq model as the target model and sglang backend. It is trainable, but I haven't tested its performance yet.

SGLang Launch error with Qwen2.5-14B draft model, need help

I have fix it.

EAGLE3 on Qwen2.5-VL / Qwen3-VL shows extremely low accept length (accept_len ≈ 1)

Thank you for your feedback. I'll test it today.