maxtext issues

Correct the Run_Gemma.md path in README.md

Correct the path to Run_Gemma.md in README.md. This file was moved to tpu folder previously in https://github.com/google/maxtext/commit/d6769933c1f80d741ff662e2e26c2b545837154e.

hengtaoguo

Change l2norm to use jnp.sqrt

Changing l2norm to use jnp.sqrt instead of **0.5. Seeing a speed up on small examples: https://screenshot.googleplex.com/A3GjjWQq5Dhes9b Colab notebook: http://shortn/_p369zYcGI2

raymondzouu

pull ready

Reviewed MaxText README.md

2

I updated the MaxText/README.md file for clarity and for it to follow the devsite style guide (mostly).

mikegre-google

pull ready

DEFAULT_MASK_VALUE causes gradient explosion and nan loss on deep models

2

I was training a llama model on GPU, with a custom embedding. It worked fine with 12 layers, dim 1024, seq length 256, but loss would become nan after the...

logicchains

bug

Revert "Mark nvidia devtools repo as trusted"

Reverts google/maxtext#611 Reverting the PR since the transient issue from nvidia is now resolved.

chajath

Support LoRA training

2

Is there a plan to support PEFT methods like LoRA training in maxtext to support larger model fine-tuning / continue pretraining so that bigger models like LLaMA-3-70B can be trainined...

hxssgaa

feature request

Support for RecurrentGemma

Hello, Are there any plans to add support for RecurrentGemma or Griffin? Would be interesting to see proper examples of model shading. Thanks!

cyrilzakka

feature request

Cannot do inference in float32

2

If we try to perform inference in float32, we get the error: ``` AssertionError: Key and Value Dtypes should match ``` This error comes from [this line](https://github.com/google/maxtext/blob/ebd39aa64d670fa13a313b6f776e01ad9e450321/MaxText/layers/attentions.py#L513). The origin of...

borisdayma

Megatron style TFLOPs Calculation

2

@rwitten this is a draft. This type of change would be specific to a few transformer models (e.g., Gemma, LLama, GPT, etc.). It wouldn't work with MoE, or some new...

abhinavgoel95

Update First_run.md - Fixed broken links path

1. [GKE, recommended] [Running Maxtext with xpk](Run_MaxText_via_xpk.md) - Quick Experimentation and Production support 2. [GCE] [Running Maxtext with Multihost Jobs](Run_MaxText_via_multihost_job.md) - Long Running Production Jobs with Queued Resources 3. [GCE]...

shivajid

pull ready

maxtext
maxtext copied to clipboard

Metadata

Correct the Run_Gemma.md path in README.md

Change l2norm to use jnp.sqrt

Reviewed MaxText README.md

DEFAULT_MASK_VALUE causes gradient explosion and nan loss on deep models

Revert "Mark nvidia devtools repo as trusted"

Support LoRA training

Support for RecurrentGemma

Cannot do inference in float32

Megatron style TFLOPs Calculation

Update First_run.md - Fixed broken links path

← Metadata

Owner

Metadata

maxtext maxtext copied to clipboard

Metadata

← Metadata

Owner

Metadata

maxtext
maxtext copied to clipboard