What does this PR do?

This PR adds support for IBM's upcoming LLMs 3B and 8B.

text models: at-ArthurZucker and at-younesbelkada

Jun 19 '24 18:06 mayank31398

@ArthurZucker @younesbelkada

Jun 20 '24 09:06 amyeroberts

hey @younesbelkada This is for our upcoming open model releases. 3B and 8B language models (lots of tokens :D)

lets just leave this PR for now. I will get back to this in a few days.

Jun 20 '24 18:06 mayank31398

Hey! Do you need another review here?

Jul 22 '24 13:07 ArthurZucker

yeah, @ArthurZucker I need to fix the model on HF to make it work with the renamed params: https://huggingface.co/mayank-mishra/granite-3b-mup but you can give a review, that will be helpful

Jul 22 '24 15:07 mayank31398

@ArthurZucker dropping this breaks a unit tests https://github.com/huggingface/transformers/pull/31502#discussion_r1675633090

Jul 22 '24 17:07 mayank31398

Which test?

Jul 22 '24 17:07 ArthurZucker

FAILED tests/models/granite/test_modeling_granite.py::GraniteModelTest::test_model_outputs_equivalence - RuntimeError: The size of tensor a (7) must match the size of tensor b (32) at non-singleton dimension 3

dug a little and found that since past_key_values is None its being removed when outputting a tupple from the model. this results in the tuple and dict outputs to have different number of outputs.

Jul 22 '24 17:07 mayank31398

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Jul 22 '24 18:07 HuggingFaceDocBuilderDev

@ArthurZucker quick question do you think that this commit is a reasonable change? https://github.com/huggingface/transformers/pull/31502/commits/b64c16d05decc13e23314e192807ccdbefdd5b85 It moves rope out of the attention modules since with the padding free integration, we already pass rope to each attention layer?

Jul 30 '24 15:07 mayank31398

Yep absolutely perfect

Jul 30 '24 15:07 ArthurZucker

Cool! Ping me once this needs a merge ~= when checkpoints are released!

Aug 05 '24 05:08 ArthurZucker

@ArthurZucker this is ready for merge

Aug 27 '24 10:08 mayank31398

I have addressed the changes

Aug 27 '24 15:08 mayank31398

Thanks for bearing with me 🤗

Aug 27 '24 18:08 ArthurZucker

passed docs 🥳

Aug 27 '24 18:08 mayank31398

thanks Arthur :)

Aug 27 '24 19:08 mayank31398

Thank you as well! 🤗

Aug 28 '24 06:08 ArthurZucker

hello!

module 'torch.nn' has no attribute 'RMSNorm'

The version of torch < 2.4.0 will report an error.

Aug 28 '24 08:08 Jintao-Huang

Granite language models [WIP]

What does this PR do?