gemma-cookbook Continue to pretrain Gemma on TPU

Continue to pretrain Gemma on TPU

Open bebechien opened this issue 10 months ago • 2 comments

Description of the feature request:

A cookbook showcasing Continued Pretraining on TPU. Continuing and potentially expanding the Gemma to further enhance the model's capabilities and efficiency.

What problem are you trying to solve with this feature?

Large language models (LLMs) like Gemma require massive computational resources for pretraining. The choice of hardware significantly impacts the efficiency, speed, and ultimately the quality of the resulting model. While Gemma has benefited from TPU pretraining, it's important to ensure this strategy continues to be a priority to maintain its competitive edge and drive further advancements.

Any other information you'd like to share?

No response

Jan 27 '25 07:01 bebechien

This is being worked on by @kinarr et al. It is not hard but does need to revert the JAX sharding scheme back to Gemma v1 when it first came out; otherwise Kaggle TPU OOO (Colab TPU is hopeless).

Jan 27 '25 09:01 windmaple

any update on this request? @osanseviero

Jul 17 '25 15:07 dadelani

gemma-cookbook gemma-cookbook copied to clipboard

Continue to pretrain Gemma on TPU

Description of the feature request:

What problem are you trying to solve with this feature?

Any other information you'd like to share?

gemma-cookbook
gemma-cookbook copied to clipboard