Sanchit Gandhi issues

Results 26 issues of


Sanchit Gandhi

HF LLaMA Flax

Hey @ayaka14732! Super cool repo - thanks for working on this! With @vvvm23, we're working on adding the Flax LLaMA model to HF Transformers: https://github.com/huggingface/transformers/pull/24587 Just thought I'd let you...

[Modelling] ROPE and Prompt Cross-Attention

ROPE: * Applied to the q/k/v states in the self-attention * Applied to the q states only in the cross-attention (not the k/v states) * The rationale is that the...

Reproducing benchmarks

Hey @shashikg! Thanks for your awesome work on this repo - it's a very cool compilation of the various Whisper implementations 🙌 I'm working on the Hugging Face implementation, and...

MMLU evaluation fails with Mistral

Evaluating any Mistral checkpoint on MMLU throws an error, suggesting that there are several tokens tokens when only one is permitted: ``` accelerate launch --num_processes=1 run_evals_accelerate.py \ --model_args "pretrained=hf-internal-testing/tiny-random-MistralForCausalLM" \...

[whisper] static kv cache

# What does this PR do? Supersedes https://github.com/huggingface/transformers/pull/28931 and extends it by adding static k/v cache support for Whisper. Also improves the performance of the eager attention implementation by removing...

[generate] fix eos/pad id check on mps devices

# What does this PR do? Generation currently fails on `main` for mps devices: ```python from transformers.models.gemma2 import Gemma2ForCausalLM, Gemma2Config import torch config = Gemma2Config(num_hidden_layers=1, vocab_size=128, hidden_size=16, intermediate_size=32, num_attention_heads=1, num_key_value_heads=1)...