jiapingW
jiapingW
Regarding the memory issue, you might try using the `--sglang-mem-fraction-static` parameter to reduce memory usage. Since I train very large models offline, these are the only suggestions I have.
You can use the hf backend. > I meet the same error,but i want train qwen3 4b
I think the implementation of adding an ignore_token is concise and reasonable. That's great!
Do you mean inheriting the sglang model and using it as the target model for inference to generate hidden states? Or do you mean if you implement Eagle3 (e.g., llama3)...
My understanding of TTT (train-time-test) is that it's used to align inference and training. (See this paper: https://arxiv.org/abs/2408.15766.) It's not a form of data augmentation. Without TTT, the model doesn't...
Your understanding is profound. I agree that Eagle3 training is a process of knowledge distillation. Whether the model training goal is to align logits or features, from a high-level perspective,...
I'm currently testing this feature, and if it doesn't work, I'll try to support it.
I test online_training use qwen2.5-7b-awq model as the target model and sglang backend. It is trainable, but I haven't tested its performance yet.
I have fix it.
Thank you for your feedback. I'll test it today.