keras-nlp Added falcon model converter

Falcon model converter is missing. Added the same. Fixes #1988

Jan 09 '25 19:01 mehtamansi29

@SamanehSaadat can you take a look for the falcon conversions options here? I remember there were some annoying gotchas (e.g. different tokenizer types), that this might not conver.

Jan 13 '25 22:01 mattdangerw

Hi @mehtamansi29, just checking on this PR. Looks like we need to add a numerics verification notebook and swap out the 7b preset for the 1b (along with a test checkpoint for the unit testing).

Mar 27 '25 09:03 JyotinderSingh

Hi @mattdangerw and @JyotinderSingh -

Here is notebooks regarding 7b numerics for falcon model and that seems different for huggingface and keras_hub model. I'll take a look into the converter again to get correct numerics.

Apr 17 '25 13:04 mehtamansi29

Hi @JyotinderSingh and @mattdangerw - I’ve updated the Falcon converter to include support for both GQA (Grouped Query Attention) and MQA (Multi-Query Attention). With these changes, the converter can now handle weights for both the Falcon 1B and 7B models.

Here is the notebook where both models(falcon 1b and 7b) load correctly. The total parameters are nearly identical, and the numerics line up as expected.

Aug 19 '25 09:08 mehtamansi29

Hi @mehtamansi29 , I ran the parameter verification colab, 1B model is matching the numerics, but there are trainable parameters mismatch to the 7B model. Could you please check it, here is the updated Gist

Torch Trainable parameters: 6,921,720,704
Total Keras Trainable Parameters): 6,923,033,472
Difference in parameters: 1312768

Aug 20 '25 21:08 sachinprasadhs

Hi @mehtamansi29 , I ran the parameter verification colab, 1B model is matching the numerics, but there are trainable parameters mismatch to the 7B model. Could you please check it, here is the updated Gist
Torch Trainable parameters: 6,921,720,704
Total Keras Trainable Parameters): 6,923,033,472
Difference in parameters: 1312768

Hi @sachinprasadhs - I have fixed the issue where the 7B model was failing because it uses use_bias=False, while the 1B model uses use_bias=True. Now the numerics for both 7B and 1B are correct. Could you please take a look into the gist here?

7b model Parameters:

Torch Trainable parameters: 6,921,720,704
Total Keras Trainable Parameters): 6,921,720,704
Difference in parameters: 0

1b model Parameters:

Torch Trainable parameters: 1,311,625,216
Total Keras Trainable Parameters): 1,311,625,216
Difference in parameters: 0

Sep 09 '25 09:09 mehtamansi29

keras-nlp keras-nlp copied to clipboard

Added falcon model converter

keras-nlp
keras-nlp copied to clipboard