flax icon indicating copy to clipboard operation
flax copied to clipboard

Adding nnx Gemma2-2b (including overall fixes) to examples/gemma

Open mdda opened this issue 9 months ago • 5 comments

Adds Gemma2-2b (including GQA to Attention and fixes to Block)

Fixes #4567

It includes:

  • New TransformerConfig for gemma2-2b model
  • Renames existing configs to make them uniform
    • NB: This cannot affect existing code, since this module was previously unusable
  • Added additional params reading key adjustment (key in gemma2-2b needs remapping from kaggle download)
  • Adds GQA to Attention module
  • Reorders the operations in Block module so that logits output from overall Transformer are not gibberish
    • logits confirmed to (approximately) match those from GDE gemma (flax linen) model
  • No new documentation provided
    • This change would make the example in the nnx documentation work (did not work before)
  • No additional tests provided

mdda avatar Feb 28 '25 05:02 mdda

Hey Martin! Thanks for doing this. Some folks are internally also improving the model. Let me merge their changes first and make sure there are no conflicts with those changes.

cgarciae avatar Mar 04 '25 20:03 cgarciae

Ahhh - Brings back memories of PRs for TensorFlow. Good times! /s

I'll attempt fix the white-space check failures, when you let me know that it isn't a waste of my time.

mdda avatar Mar 05 '25 05:03 mdda

Thanks @mdda for doing this, it took a while because there where other changes to the model in the background that were pending. I think this is great. Can you please add a test to check that the GQA configuration works and matches the base version?

Also now need to solve the conflicts, sorry about this, this code is being used by a couple of users internally.

cgarciae avatar Mar 14 '25 20:03 cgarciae

Surprised that the code was being used internally prior to my PR, since the Block module was entirely borked.

mdda avatar Mar 15 '25 10:03 mdda

@mdda please take a look at CI, you probably need to run:

pip install pre-commit
pre-commit run --all-files

cgarciae avatar Mar 20 '25 17:03 cgarciae

Hey @mdda , I wonder whether you would be able to finalize this PR? If you are busy, we can take over it keeping your commits (such that you will be credited for the work you have done). Please, let me know. Thanks!

vfdev-5 avatar Jul 21 '25 16:07 vfdev-5