Jay Desai
Jay Desai
Wow, this started in May and still hasn't closed, Deepspeed folks are really slow!
I can't even train 3b model with the same config posted here
I am likely doing something wrong, @LuJunru do you have your training code on git?
Thanks, I was trying stage 1 and 2 deepspeed, will tryout fsdp in trainer too. Thanks
+1, getting same error
config is loaded from https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/configs/ds_flan_t5_z3_config.json training_args = TrainingArguments( output_dir=f"./results/{question_name}_{output_dir_suffix}", learning_rate=lr, per_device_train_batch_size=batch_size, per_device_eval_batch_size=batch_size, # auto_find_batch_size=True, num_train_epochs=epochs, weight_decay=0.02, warmup_steps=warmup_steps, #1epoch = 1530/16-- 95 steps lr_scheduler_type= 'linear', optim='adamw_torch', evaluation_strategy='epoch', # save_strategy='epoch',save_steps=eval_steps, logging_steps=eval_steps, eval_steps=eval_steps,...
Error is when using PEFT with Flan..
havent, using regular inference without deepspeed
> Hi @djaym7, I apologize for the delayed response. > > I have tried to reproduce the problem using both deepspeed and PEFT (prefix tuning) but haven't seen the same...