transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Granite language models [WIP]

Open mayank31398 opened this issue 1 year ago • 2 comments

What does this PR do?

This PR adds support for IBM's upcoming LLMs 3B and 8B.

  • text models: at-ArthurZucker and at-younesbelkada

mayank31398 avatar Jun 19 '24 18:06 mayank31398

@ArthurZucker @younesbelkada

amyeroberts avatar Jun 20 '24 09:06 amyeroberts

hey @younesbelkada This is for our upcoming open model releases. 3B and 8B language models (lots of tokens :D)

lets just leave this PR for now. I will get back to this in a few days.

mayank31398 avatar Jun 20 '24 18:06 mayank31398

Hey! Do you need another review here?

ArthurZucker avatar Jul 22 '24 13:07 ArthurZucker

yeah, @ArthurZucker I need to fix the model on HF to make it work with the renamed params: https://huggingface.co/mayank-mishra/granite-3b-mup but you can give a review, that will be helpful

mayank31398 avatar Jul 22 '24 15:07 mayank31398

@ArthurZucker dropping this breaks a unit tests https://github.com/huggingface/transformers/pull/31502#discussion_r1675633090

mayank31398 avatar Jul 22 '24 17:07 mayank31398

Which test?

ArthurZucker avatar Jul 22 '24 17:07 ArthurZucker

FAILED tests/models/granite/test_modeling_granite.py::GraniteModelTest::test_model_outputs_equivalence - RuntimeError: The size of tensor a (7) must match the size of tensor b (32) at non-singleton dimension 3

dug a little and found that since past_key_values is None its being removed when outputting a tupple from the model. this results in the tuple and dict outputs to have different number of outputs.

mayank31398 avatar Jul 22 '24 17:07 mayank31398

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ArthurZucker quick question do you think that this commit is a reasonable change? https://github.com/huggingface/transformers/pull/31502/commits/b64c16d05decc13e23314e192807ccdbefdd5b85 It moves rope out of the attention modules since with the padding free integration, we already pass rope to each attention layer?

mayank31398 avatar Jul 30 '24 15:07 mayank31398

Yep absolutely perfect

ArthurZucker avatar Jul 30 '24 15:07 ArthurZucker

Cool! Ping me once this needs a merge ~= when checkpoints are released!

ArthurZucker avatar Aug 05 '24 05:08 ArthurZucker

@ArthurZucker this is ready for merge

mayank31398 avatar Aug 27 '24 10:08 mayank31398

I have addressed the changes

mayank31398 avatar Aug 27 '24 15:08 mayank31398

Thanks for bearing with me 🤗

ArthurZucker avatar Aug 27 '24 18:08 ArthurZucker

passed docs 🥳

mayank31398 avatar Aug 27 '24 18:08 mayank31398

thanks Arthur :)

mayank31398 avatar Aug 27 '24 19:08 mayank31398

Thank you as well! 🤗

ArthurZucker avatar Aug 28 '24 06:08 ArthurZucker

hello!

module 'torch.nn' has no attribute 'RMSNorm'

The version of torch < 2.4.0 will report an error.

Jintao-Huang avatar Aug 28 '24 08:08 Jintao-Huang