gemma-cookbook
gemma-cookbook copied to clipboard
Continue to pretrain Gemma on TPU
Description of the feature request:
A cookbook showcasing Continued Pretraining on TPU. Continuing and potentially expanding the Gemma to further enhance the model's capabilities and efficiency.
What problem are you trying to solve with this feature?
Large language models (LLMs) like Gemma require massive computational resources for pretraining. The choice of hardware significantly impacts the efficiency, speed, and ultimately the quality of the resulting model. While Gemma has benefited from TPU pretraining, it's important to ensure this strategy continues to be a priority to maintain its competitive edge and drive further advancements.
Any other information you'd like to share?
No response
This is being worked on by @kinarr et al. It is not hard but does need to revert the JAX sharding scheme back to Gemma v1 when it first came out; otherwise Kaggle TPU OOO (Colab TPU is hopeless).
any update on this request? @osanseviero