hahmad2008
hahmad2008
same here with mistral the output is empty
@winglian but I full finetuned the same TinyLLama model using fp16 with deepspeed Zero2, and there is no problem with it, no NAN weights.
@winglian btw I am using docker version with the following packages versions: cuda: 11.8 pytorch: 2.0.1+cu118 accelerate: 0.24.0.dev0 transformers: 4.35.0.dev0
@winglian I changed the learning rate to learning_rate: 0.000002 and the loss still become ZERO
@winglian Any idea, please?
@NanoCode012 I am using 2 X Tesla T4 using fp16
@NanoCode012 Thanks, I will give it a try and will come back to you.
@NanoCode012 @winglian I tried with bf16 on A10 GPU and the training loss was stable, but with fp16 it was not stable the loss was jumping to zero and weight...
@NanoCode012 No at all.
@lintangsutawika I used the main branch and the issue is still there issue opened https://github.com/EleutherAI/lm-evaluation-harness/issues/1340