Hyeonbin Hwang
Hyeonbin Hwang
Me too.
Thanks for the reply! Yes, I have that code above the print statement. I was using accelerate 0.27.2 and deepspeed 0.13.4. I guess I'll downgrade it to 0.13.3 and try...
Yup it works nicely :) One thing that's concerning is that for some reason, this takes considerably much more time than DPOTrainer on the same data. Any ideas why?
Thanks for the comment! closing this now :)
I'm also curious about this. Can we just alternatively use kto_pair loss as there seems to be errors in KTOTrainer for now..?
I'm experiencing same issue :( seems like the grad_norm suddenly diverges to infinity after some iterations.