hahmad2008 comments

Results 48 comments of


                                            hahmad2008

trafficstars

Generate nothing from VLLM output

same here with mistral the output is empty

FSDP Full-finetuned Model params and weights are NAN

@winglian but I full finetuned the same TinyLLama model using fp16 with deepspeed Zero2, and there is no problem with it, no NAN weights.

FSDP Full-finetuned Model params and weights are NAN

@winglian btw I am using docker version with the following packages versions: cuda: 11.8 pytorch: 2.0.1+cu118 accelerate: 0.24.0.dev0 transformers: 4.35.0.dev0

FSDP Full-finetuned Model params and weights are NAN

@winglian I changed the learning rate to learning_rate: 0.000002 and the loss still become ZERO

FSDP Full-finetuned Model params and weights are NAN

@winglian Any idea, please?

FSDP Full-finetuned Model params and weights are NAN

@NanoCode012 I am using 2 X Tesla T4 using fp16

FSDP Full-finetuned Model params and weights are NAN

@NanoCode012 Thanks, I will give it a try and will come back to you.

FSDP Full-finetuned Model params and weights are NAN

@NanoCode012 @winglian I tried with bf16 on A10 GPU and the training loss was stable, but with fp16 it was not stable the loss was jumping to zero and weight...

Model is not saved for full finetune with Deepspeed Zero3

@NanoCode012 No at all.

truthfulqa_mc2 is Nan, while truthfulqa_mc1 is 1.00

@lintangsutawika I used the main branch and the issue is still there issue opened https://github.com/EleutherAI/lm-evaluation-harness/issues/1340