Matthew Davidow
Matthew Davidow
I've uploaded loss curve data using test_convergence_1b_params.sh [here](https://github.com/google/maxtext/compare/mattdavidow-convergence-share?expand=1) Here is a screenshot of some learning metrics from that run that we display via tensorboard: 
This looks okay to me, but I will re-iterate my earlier point that Raymond also brought up - the API of multihost_job no longer supports customization of the curl command...
This is on our roadmap with high priority, wil update here once we start working on it
That solution if config.accumulate_gradient_steps > 1: optimizer = optax.MultiSteps(optimizer, config.accumulate_gradient_steps) should work fine, however we have added gradient accumulation as a config which should have a slightly more accurate accumulation,...
Indeed https://github.com/google/maxtext/pull/581 was adding support for this. Out of curiosity what is your use case for this?
Thank you for finding this!
Does this cause issues for you during setup.sh?
> Something happened when trying to squash the commits, so I created another PR. The old one is here: #763. Per discussion in that PR, we should keep the `else...
We are still looking into this in the open source side! Likely at least 6 months away
Yes this is still something we are considering, likely requires a paradigm shift (multiple programs multiple data) to support