maxtext
maxtext copied to clipboard
A simple, performant and scalable Jax LLM!
Llama3
Has anyone tried to train the newest models on MaxText. For instance Llama3 and Mistral v.0.3? It is a bit unclear to me how much work this might be to...
The llama_or_mistral_ckpt.py requires --base-model-path to be in local file system, whereas the --maxtext-model-path is GCS. It would be good to change the implmentation to use fsspec or tf GFile or...
This PR modifies the parameter conversion mixtral tests to go through `gcsfuse` instead of disk for lower VM disk usage
experimental_proxy only has 2 commits on it since branching: ``` $ git log --no-merges origin/experimental_proxy ^origin/main commit 13f519e39e0d904e320c7d8a472161e4bcf03408 (HEAD -> avritt/noocdbt, origin/vivianrwu_experimental_proxy, origin/experimental_proxy, experimental_proxy) Author: Zhihao Shan Date: Mon Sep...
b/371572923 Tested on v4-128: https://cloudlogging.app.goo.gl/YqjMDsc27SxXHSLaA
allow_split_physical_axes is only supported for device meshes atm but we also should support this for hybrid meshes. This is useful when we want to use FSDP across DCN and ICI...