Matthew Davidow comments

Results 4 comments of


                                            Matthew Davidow

[Question] are there some some train replication results?

I've uploaded loss curve data using test_convergence_1b_params.sh [here](https://github.com/google/maxtext/compare/mattdavidow-convergence-share?expand=1) Here is a screenshot of some learning metrics from that run that we display via tensorboard: ![image](https://github.com/google/maxtext/assets/51136315/8d885742-0ad7-400e-9a5a-2ed809f73d2f)

Add support for curl command

This looks okay to me, but I will re-iterate my earlier point that Raymond also brought up - the API of multihost_job no longer supports customization of the curl command...

Support LoRA training

This is on our roadmap with high priority, wil update here once we start working on it

Question: Gradient Accumulation

That solution if config.accumulate_gradient_steps > 1: optimizer = optax.MultiSteps(optimizer, config.accumulate_gradient_steps) should work fine, however we have added gradient accumulation as a config which should have a slightly more accurate accumulation,...