Mayank Mishra

Results 187 comments of Mayank Mishra

I don't think that 3b and 8b are working yet @DigitLib the 34b and 20b PR is merged and its working: https://github.com/ggerganov/llama.cpp/pull/7324 20b-base GGUF is available now: https://huggingface.co/ibm-granite/granite-20b-code-base-GGUF I will...

if [that commit](https://github.com/sroecker/llama.cpp/commit/36dc5bbffe083545045ec2441ddc7f5c085d3caf) is working, can we open a PR @sroecker ?

hey @younesbelkada This is for our upcoming open model releases. 3B and 8B language models (lots of tokens :D) lets just leave this PR for now. I will get back...

yeah, @ArthurZucker I need to fix the model on HF to make it work with the renamed params: https://huggingface.co/mayank-mishra/granite-3b-mup but you can give a review, that will be helpful

@ArthurZucker dropping this breaks a unit tests https://github.com/huggingface/transformers/pull/31502#discussion_r1675633090

FAILED tests/models/granite/test_modeling_granite.py::GraniteModelTest::test_model_outputs_equivalence - RuntimeError: The size of tensor a (7) must match the size of tensor b (32) at non-singleton dimension 3 dug a little and found that since `past_key_values`...

@ArthurZucker quick question do you think that this commit is a reasonable change? https://github.com/huggingface/transformers/pull/31502/commits/b64c16d05decc13e23314e192807ccdbefdd5b85 It moves rope out of the attention modules since with the padding free integration, we already...

@ArthurZucker this is ready for merge

I have addressed the changes

passed docs 🥳