yd-oom comments

Results 5 comments of


                                            yd-oom

swift=3.10.1 results unexpected OOM during on-policy GKD

@hjh0119 I also meet this problem. --deepspeed zero2 --teacher_deepspeed zero3 is ok in 3.10.0 but will oom in 3.10.1 for gkd training Did anything change between 3.10.0 and 3.10.1 that...

[Bug] Model weights saved incompletely under multi-TP training

@Qin10 I update save_pretrained method in the Eagle3DraftModel base class (specforge/modeling/draft/base.py). you can try new #117

[Bug] [OOM issue] How to use 16K or 32k ctx on llama3.1-70b or llama3.1-8b

same promblem. i also meet oom under offline mode. offline mode should not related with target model，maybe add tp on draft model ?

[Bug] [OOM issue] How to use 16K or 32k ctx on llama3.1-70b or llama3.1-8b

I already try this flashattention pr. #103 not work for my case

Feat: Support TP for long-context draft model training

@zyksir Hi，Conflicts resolved. This was tested on Llama 3.1 8B. The results with TP=2 are identical to the baseline (non-TP) after two epochs on ShareGPT. ![加速比](https://github.com/user-attachments/assets/61a730ee-14f5-460a-8998-a10eaf20529e) Our team has been...