yd-oom
yd-oom
@hjh0119 I also meet this problem. --deepspeed zero2 --teacher_deepspeed zero3 is ok in 3.10.0 but will oom in 3.10.1 for gkd training Did anything change between 3.10.0 and 3.10.1 that...
@Qin10 I update save_pretrained method in the Eagle3DraftModel base class (specforge/modeling/draft/base.py). you can try new #117
same promblem. i also meet oom under offline mode. offline mode should not related with target model,maybe add tp on draft model ?
I already try this flashattention pr. #103 not work for my case
@zyksir Hi,Conflicts resolved. This was tested on Llama 3.1 8B. The results with TP=2 are identical to the baseline (non-TP) after two epochs on ShareGPT.  Our team has been...