Matthew Davidow

Results 20 comments of Matthew Davidow

I've uploaded loss curve data using test_convergence_1b_params.sh [here](https://github.com/google/maxtext/compare/mattdavidow-convergence-share?expand=1) Here is a screenshot of some learning metrics from that run that we display via tensorboard: ![image](https://github.com/google/maxtext/assets/51136315/8d885742-0ad7-400e-9a5a-2ed809f73d2f)

This looks okay to me, but I will re-iterate my earlier point that Raymond also brought up - the API of multihost_job no longer supports customization of the curl command...

This is on our roadmap with high priority, wil update here once we start working on it

That solution if config.accumulate_gradient_steps > 1: optimizer = optax.MultiSteps(optimizer, config.accumulate_gradient_steps) should work fine, however we have added gradient accumulation as a config which should have a slightly more accurate accumulation,...

Indeed https://github.com/google/maxtext/pull/581 was adding support for this. Out of curiosity what is your use case for this?

Thank you for finding this!

> Something happened when trying to squash the commits, so I created another PR. The old one is here: #763. Per discussion in that PR, we should keep the `else...

We are still looking into this in the open source side! Likely at least 6 months away

Yes this is still something we are considering, likely requires a paradigm shift (multiple programs multiple data) to support