maxtext
maxtext copied to clipboard
Llama3
Has anyone tried to train the newest models on MaxText. For instance Llama3 and Mistral v.0.3?
It is a bit unclear to me how much work this might be to support these models here. Do you for instance have to implement GQA in Llama3.
If this is mainly updating the config files, it would be extremely helpful if there were working config files for the newest versions of Llama, Gemma and Mistral.