Matthew Davidow comments

Results 20 comments of


                                            Matthew Davidow

[Question] are there some some train replication results?

I've uploaded loss curve data using test_convergence_1b_params.sh [here](https://github.com/google/maxtext/compare/mattdavidow-convergence-share?expand=1) Here is a screenshot of some learning metrics from that run that we display via tensorboard: ![image](https://github.com/google/maxtext/assets/51136315/8d885742-0ad7-400e-9a5a-2ed809f73d2f)

Add support for curl command

This looks okay to me, but I will re-iterate my earlier point that Raymond also brought up - the API of multihost_job no longer supports customization of the curl command...

Support LoRA training

This is on our roadmap with high priority, wil update here once we start working on it

Question: Gradient Accumulation

That solution if config.accumulate_gradient_steps > 1: optimizer = optax.MultiSteps(optimizer, config.accumulate_gradient_steps) should work fine, however we have added gradient accumulation as a config which should have a slightly more accurate accumulation,...

converting Gemma maxtext compatible checkpoint to Hugging Face format

Indeed https://github.com/google/maxtext/pull/581 was adding support for this. Out of curiosity what is your use case for this?

fix convert gemma link

Thank you for finding this!

https://us-python.pkg.dev/gce-ai-infra/maxtext-build-support-packages/simple/ not public

Does this cause issues for you during setup.sh?

Add enable_model_warmup flag for AOT compilation at model server start

> Something happened when trying to squash the commits, so I created another PR. The old one is here: #763. Per discussion in that PR, we should keep the `else...

How to implement 1F1B pipeline parallelism in Jax?

We are still looking into this in the open source side! Likely at least 6 months away

How to implement 1F1B pipeline parallelism in Jax?

Yes this is still something we are considering, likely requires a paradigm shift (multiple programs multiple data) to support