Jay Desai comments

Results 45 comments of


                                            Jay Desai

Gradio client 0.7.0 doesnt work without internet

yes @pngwn

[BUG] try to finetune a llama 33b on 8*A100 40G, 600G RAM. But always OOM on RAM.

Wow, this started in May and still hasn't closed, Deepspeed folks are really slow!

[BUG] try to finetune a llama 33b on 8*A100 40G, 600G RAM. But always OOM on RAM.

I can't even train 3b model with the same config posted here

[BUG] try to finetune a llama 33b on 8*A100 40G, 600G RAM. But always OOM on RAM.

I am likely doing something wrong, @LuJunru do you have your training code on git?

[BUG] try to finetune a llama 33b on 8*A100 40G, 600G RAM. But always OOM on RAM.

Thanks, I was trying stage 1 and 2 deepspeed, will tryout fsdp in trainer too. Thanks

RuntimeError: 'weight' must be 2-D while training Flan-T5 models with stage 3

+1, getting same error

RuntimeError: 'weight' must be 2-D while training Flan-T5 models with stage 3

config is loaded from https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/configs/ds_flan_t5_z3_config.json training_args = TrainingArguments( output_dir=f"./results/{question_name}_{output_dir_suffix}", learning_rate=lr, per_device_train_batch_size=batch_size, per_device_eval_batch_size=batch_size, # auto_find_batch_size=True, num_train_epochs=epochs, weight_decay=0.02, warmup_steps=warmup_steps, #1epoch = 1530/16-- 95 steps lr_scheduler_type= 'linear', optim='adamw_torch', evaluation_strategy='epoch', # save_strategy='epoch',save_steps=eval_steps, logging_steps=eval_steps, eval_steps=eval_steps,...