Edoardo Cetin

Results 7 comments of Edoardo Cetin

A minimal example of this erroneous behavior can be reproduced via: ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_name = "meta-llama/Meta-Llama-3-8B" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, device_map='cuda', torch_dtype=torch.bfloat16...

> Great catch. > > 1. `causal_mask = AttentionMaskConverter._unmask_unattended(causal_mask, min_dtype)` line 1127 needs to be ignored as well. > 2. we need to add your small example script as a...

> Feel free to rebase it might be fixed on main / be flaky Just did :)

@ArthurZucker Let me know if you think this fix is ready for merging, or if you'd like to add the tests to the same PR!

> Would be nice to just add the test in this PR 😉 Alright - I made the addition of output_attentions=True to the sdpa equivalence test, as you suggested ;)...

@ArthurZucker thanks for your suggestions! I also propagated the same changes to the new jetmoe model. All default checks are now passing ^^

@jdvin thanks for your patch! +1 this would be a great feature to merge! perhaps @liangfu could help review these?