maxtext
maxtext copied to clipboard
A simple, performant and scalable Jax LLM!
Correct the path to Run_Gemma.md in README.md. This file was moved to tpu folder previously in https://github.com/google/maxtext/commit/d6769933c1f80d741ff662e2e26c2b545837154e.
Changing l2norm to use jnp.sqrt instead of **0.5. Seeing a speed up on small examples: https://screenshot.googleplex.com/A3GjjWQq5Dhes9b Colab notebook: http://shortn/_p369zYcGI2
I updated the MaxText/README.md file for clarity and for it to follow the devsite style guide (mostly).
I was training a llama model on GPU, with a custom embedding. It worked fine with 12 layers, dim 1024, seq length 256, but loss would become nan after the...
Reverts google/maxtext#611 Reverting the PR since the transient issue from nvidia is now resolved.
Is there a plan to support PEFT methods like LoRA training in maxtext to support larger model fine-tuning / continue pretraining so that bigger models like LLaMA-3-70B can be trainined...
Hello, Are there any plans to add support for RecurrentGemma or Griffin? Would be interesting to see proper examples of model shading. Thanks!
If we try to perform inference in float32, we get the error: ``` AssertionError: Key and Value Dtypes should match ``` This error comes from [this line](https://github.com/google/maxtext/blob/ebd39aa64d670fa13a313b6f776e01ad9e450321/MaxText/layers/attentions.py#L513). The origin of...
@rwitten this is a draft. This type of change would be specific to a few transformer models (e.g., Gemma, LLama, GPT, etc.). It wouldn't work with MoE, or some new...
1. [GKE, recommended] [Running Maxtext with xpk](Run_MaxText_via_xpk.md) - Quick Experimentation and Production support 2. [GCE] [Running Maxtext with Multihost Jobs](Run_MaxText_via_multihost_job.md) - Long Running Production Jobs with Queued Resources 3. [GCE]...