Convert Gemma 2 to HuggingFace
Are there any scripts for converting Gemma-2 models to HuggingFace? I see there are Llama and Mistral scripts.
You can check my commit here for the conversion script here
That is fantastic. Thanks.
@hxssgaa I made a quick test of the script, trying to convert a 2B Gemma2 model. However, I am seeing this error:
ValueError: Requested shape: (2048,) is not compatible with the stored shape: (2304,). Truncating/padding is disabled by setting of strict=True. When using standard Orbax APIs, this behavior can be modified by specifying strict=FalseinArrayRestoreArgs for any array in which padding/truncation is desired.
@hxssgaa I understand this is because it uses the settings from the base.yml-file. However, it was not obvious to my how to get the script to either rely on the structure from the loaded model, og on the model-yml files.
I also see the script refers to convert_maxtext_to_hf.py. Is that a helper file?
@hxssgaa I made a quick test of the script, trying to convert a 2B Gemma2 model. However, I am seeing this error:
ValueError: Requested shape: (2048,) is not compatible with the stored shape: (2304,). Truncating/padding is disabled by setting ofstrict=True. When using standard Orbax APIs, this behavior can be modified by specifyingstrict=FalseinArrayRestoreArgsfor any array in which padding/truncation is desired.
Hi @peregilk , I just did another test for conversion script of gemma2-2b, and didn't find the issue you are getting. The converted checkpoint exactly matches with official huggingface gemma2-2b-it. Please use the correct yml setting for conversion, your script should look like:
JAX_PLATFORMS=cpu python MaxText/gemma2_orbax_to_hf.py MaxText/configs/base.yml \
base_output_directory=/tmp/output \
load_parameters_path=/path/to/maxtext/checkpoint \
model_name='gemma2-2b' \
hf_model_path=/path/to/save/hf_model.bin \
model_size=2b
@hxssgaa I understand this is because it uses the settings from the base.yml-file. However, it was not obvious to my how to get the script to either rely on the structure from the loaded model, og on the model-yml files.
I also see the script refers to convert_maxtext_to_hf.py. Is that a helper file?
It's a typo, I already fixed it in the latest commit, it should be gemma2_orbax_to_hf.py instead.
@hxssgaa Thanks for answering me, and sorry for posing stupid questions here. Do you first save/convert the checkpoint locally to disk first?
Or can /path/to/maxtext/checkpoint be the bucket where the trained checkpoints are stored, ie 'gs://mybucket/gemma2-2B-instruct-myfinetunedmodel1/checkpoints/0/items'.
I still dont think the example command is exactly correct, but if this is stored locally and does not require a specific yml-file, this is probably just a typo.
@peregilk, no need to save the ckpt locally, you can just point the maxtext_checkpoint to the google bucket checkpoint location. Sorry tor the confusion here, I think I have changed the ckpt conversion format to be similar as llama_or_mistral_orbax_to_huggingface.py, the correct conversion script should be:
JAX_PLATFORMS=cpu python MaxText/gemma2_orbax_to_hf.py MaxText/configs/base.yml
base_output_directory=/tmp/output
load_parameters_path=/path/to/maxtext/checkpoint
model_name='gemma2-27b'
hf_model_path=/path/to/save/hf_model.bin
model_size=27b
Awesome @hxssgaa. I actually tried something similar but I think there was a small typo in my script earlier forcing it to not pick up the correct yaml.
However, now it works. I can also confirm that I have tried one model "all the way". I can confirm that I get exactly the same MMLU scores on the original google/gemma2-2b-it that I get when I run the test on a model that is converted from Kaggle/Flax, stored as checkpoint in MaxText and converted to HF with the gemma2_orbax_to_hf.py-script.
Has anyone yet tried converting gemma 3 (4b) to huggingface?
I have now done gemma 3 model from Kaggle --> maxtext format (orbax) --> continued pretraining --> (Would now like to convert to hf but seems there is now script available and I am trying to do it myself but no luck yet)
@hxssgaa any chance to develop similar code to convert Gemma 3 checkpoint to HF ?
I created such conversion script based on this https://github.com/AI-Hypercomputer/maxtext/blob/f6ebc1662cb944bd7748fb350bba164b13479b68/MaxText/gemma2_orbax_to_hf.py and bunch of trial and error with gemini 2.5 pro in Cursor.
I was able to then run some benchmarks with the converted model + tested that the model would start GRPO finetuning with Unsloth. I can share the script once maybe today evening when I am finished with work
Great @R4ZZ3 I will also test the code once you share it and get back to you with my findings.
Hi @salrowili
The file can now be found here: https://github.com/R4ZZ3/gemma_3_orbax_to_hf/blob/main/convert_gemma_3_orbax_to_hf.py
@gagika can you please take a look
This has been completed by @YixuanWang-99 please see https://github.com/AI-Hypercomputer/maxtext/blob/main/MaxText/utils/ckpt_conversion/examples/convert_gemma2_to_hf.sh