Matthew Davidow

Results 4 comments of Matthew Davidow

I've uploaded loss curve data using test_convergence_1b_params.sh [here](https://github.com/google/maxtext/compare/mattdavidow-convergence-share?expand=1) Here is a screenshot of some learning metrics from that run that we display via tensorboard: ![image](https://github.com/google/maxtext/assets/51136315/8d885742-0ad7-400e-9a5a-2ed809f73d2f)

This looks okay to me, but I will re-iterate my earlier point that Raymond also brought up - the API of multihost_job no longer supports customization of the curl command...

This is on our roadmap with high priority, wil update here once we start working on it

That solution if config.accumulate_gradient_steps > 1: optimizer = optax.MultiSteps(optimizer, config.accumulate_gradient_steps) should work fine, however we have added gradient accumulation as a config which should have a slightly more accurate accumulation,...