training Gradient clipping not working for llama2_70b

Gradient clipping not working for llama2_70b_lora benchmark

Open michal2409 opened this issue 10 months ago • 1 comments

I’ve found that setting max_grad_norm has no effect, and we are not clipping gradients.

For verification, I ran convergence with max_grad_norm 1e-9 and saw no difference in eval loss, and checked the unscale_and_clip_grads and the self.clip_grad is set to 0 when I printed it here.

Mar 27 '24 15:03 michal2409

Discussed in Training WG (3/28): @itayhubara is verifying if setting this value correctly affect convergence & if this can improve convergence or reduce coefficienct of variance in RCPs.

Mar 28 '24 15:03 nv-rborkar

training training copied to clipboard

Gradient clipping not working for llama2_70b_lora benchmark

training
training copied to clipboard