hahmad2008

Results 48 comments of hahmad2008
trafficstars

same here with mistral the output is empty

@winglian but I full finetuned the same TinyLLama model using fp16 with deepspeed Zero2, and there is no problem with it, no NAN weights.

@winglian btw I am using docker version with the following packages versions: cuda: 11.8 pytorch: 2.0.1+cu118 accelerate: 0.24.0.dev0 transformers: 4.35.0.dev0

@winglian I changed the learning rate to learning_rate: 0.000002 and the loss still become ZERO

@NanoCode012 I am using 2 X Tesla T4 using fp16

@NanoCode012 Thanks, I will give it a try and will come back to you.

@NanoCode012 @winglian I tried with bf16 on A10 GPU and the training loss was stable, but with fp16 it was not stable the loss was jumping to zero and weight...

@lintangsutawika I used the main branch and the issue is still there issue opened https://github.com/EleutherAI/lm-evaluation-harness/issues/1340