keras-nlp
keras-nlp copied to clipboard
Added falcon model converter
@SamanehSaadat can you take a look for the falcon conversions options here? I remember there were some annoying gotchas (e.g. different tokenizer types), that this might not conver.
Hi @mehtamansi29, just checking on this PR. Looks like we need to add a numerics verification notebook and swap out the 7b preset for the 1b (along with a test checkpoint for the unit testing).
Hi @mattdangerw and @JyotinderSingh -
Here is notebooks regarding 7b numerics for falcon model and that seems different for huggingface and keras_hub model. I'll take a look into the converter again to get correct numerics.
Hi @JyotinderSingh and @mattdangerw - I’ve updated the Falcon converter to include support for both GQA (Grouped Query Attention) and MQA (Multi-Query Attention). With these changes, the converter can now handle weights for both the Falcon 1B and 7B models.
Here is the notebook where both models(falcon 1b and 7b) load correctly. The total parameters are nearly identical, and the numerics line up as expected.
Hi @mehtamansi29 , I ran the parameter verification colab, 1B model is matching the numerics, but there are trainable parameters mismatch to the 7B model. Could you please check it, here is the updated Gist
Torch Trainable parameters: 6,921,720,704
Total Keras Trainable Parameters): 6,923,033,472
Difference in parameters: 1312768
Hi @mehtamansi29 , I ran the parameter verification colab, 1B model is matching the numerics, but there are trainable parameters mismatch to the 7B model. Could you please check it, here is the updated Gist
Torch Trainable parameters: 6,921,720,704 Total Keras Trainable Parameters): 6,923,033,472 Difference in parameters: 1312768
Hi @sachinprasadhs - I have fixed the issue where the 7B model was failing because it uses use_bias=False, while the 1B model uses use_bias=True. Now the numerics for both 7B and 1B are correct. Could you please take a look into the gist here?
7b model Parameters:
Torch Trainable parameters: 6,921,720,704
Total Keras Trainable Parameters): 6,921,720,704
Difference in parameters: 0
1b model Parameters:
Torch Trainable parameters: 1,311,625,216
Total Keras Trainable Parameters): 1,311,625,216
Difference in parameters: 0