Granite language models [WIP]
What does this PR do?
This PR adds support for IBM's upcoming LLMs 3B and 8B.
- text models: at-ArthurZucker and at-younesbelkada
@ArthurZucker @younesbelkada
hey @younesbelkada This is for our upcoming open model releases. 3B and 8B language models (lots of tokens :D)
lets just leave this PR for now. I will get back to this in a few days.
Hey! Do you need another review here?
yeah, @ArthurZucker I need to fix the model on HF to make it work with the renamed params: https://huggingface.co/mayank-mishra/granite-3b-mup but you can give a review, that will be helpful
@ArthurZucker dropping this breaks a unit tests https://github.com/huggingface/transformers/pull/31502#discussion_r1675633090
Which test?
FAILED tests/models/granite/test_modeling_granite.py::GraniteModelTest::test_model_outputs_equivalence - RuntimeError: The size of tensor a (7) must match the size of tensor b (32) at non-singleton dimension 3
dug a little and found that since past_key_values is None its being removed when outputting a tupple from the model.
this results in the tuple and dict outputs to have different number of outputs.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
@ArthurZucker quick question do you think that this commit is a reasonable change? https://github.com/huggingface/transformers/pull/31502/commits/b64c16d05decc13e23314e192807ccdbefdd5b85 It moves rope out of the attention modules since with the padding free integration, we already pass rope to each attention layer?
Yep absolutely perfect
Cool! Ping me once this needs a merge ~= when checkpoints are released!
@ArthurZucker this is ready for merge
I have addressed the changes
Thanks for bearing with me 🤗
passed docs 🥳
thanks Arthur :)
Thank you as well! 🤗
hello!
module 'torch.nn' has no attribute 'RMSNorm'
The version of torch < 2.4.0 will report an error.