keras-nlp icon indicating copy to clipboard operation
keras-nlp copied to clipboard

Added falcon model converter

Open mehtamansi29 opened this issue 10 months ago • 3 comments

Falcon model converter is missing. Added the same. Fixes #1988

mehtamansi29 avatar Jan 09 '25 19:01 mehtamansi29

@SamanehSaadat can you take a look for the falcon conversions options here? I remember there were some annoying gotchas (e.g. different tokenizer types), that this might not conver.

mattdangerw avatar Jan 13 '25 22:01 mattdangerw

Hi @mehtamansi29, just checking on this PR. Looks like we need to add a numerics verification notebook and swap out the 7b preset for the 1b (along with a test checkpoint for the unit testing).

JyotinderSingh avatar Mar 27 '25 09:03 JyotinderSingh

Hi @mattdangerw and @JyotinderSingh -

Here is notebooks regarding 7b numerics for falcon model and that seems different for huggingface and keras_hub model. I'll take a look into the converter again to get correct numerics.

mehtamansi29 avatar Apr 17 '25 13:04 mehtamansi29

Hi @JyotinderSingh and @mattdangerw - I’ve updated the Falcon converter to include support for both GQA (Grouped Query Attention) and MQA (Multi-Query Attention). With these changes, the converter can now handle weights for both the Falcon 1B and 7B models.

Here is the notebook where both models(falcon 1b and 7b) load correctly. The total parameters are nearly identical, and the numerics line up as expected.

mehtamansi29 avatar Aug 19 '25 09:08 mehtamansi29

Hi @mehtamansi29 , I ran the parameter verification colab, 1B model is matching the numerics, but there are trainable parameters mismatch to the 7B model. Could you please check it, here is the updated Gist

Torch Trainable parameters: 6,921,720,704
Total Keras Trainable Parameters): 6,923,033,472
Difference in parameters: 1312768

sachinprasadhs avatar Aug 20 '25 21:08 sachinprasadhs

Hi @mehtamansi29 , I ran the parameter verification colab, 1B model is matching the numerics, but there are trainable parameters mismatch to the 7B model. Could you please check it, here is the updated Gist

Torch Trainable parameters: 6,921,720,704
Total Keras Trainable Parameters): 6,923,033,472
Difference in parameters: 1312768

Hi @sachinprasadhs - I have fixed the issue where the 7B model was failing because it uses use_bias=False, while the 1B model uses use_bias=True. Now the numerics for both 7B and 1B are correct. Could you please take a look into the gist here?

7b model Parameters:

Torch Trainable parameters: 6,921,720,704
Total Keras Trainable Parameters): 6,921,720,704
Difference in parameters: 0

1b model Parameters:

Torch Trainable parameters: 1,311,625,216
Total Keras Trainable Parameters): 1,311,625,216
Difference in parameters: 0

mehtamansi29 avatar Sep 09 '25 09:09 mehtamansi29