Michal Futrega

Results 2 issues of Michal Futrega

I’ve found that setting max_grad_norm has no effect, and we are not clipping gradients. For verification, I ran convergence with max_grad_norm 1e-9 and saw no difference in eval loss, and...

# What does this PR do ? Add a one line overview of what this PR aims to accomplish. **Collection**: [Note which collection this PR will affect] # Changelog -...

CI
Run CICD