Sanchit Gandhi

Results 26 issues of Sanchit Gandhi

Hey @ayaka14732! Super cool repo - thanks for working on this! With @vvvm23, we're working on adding the Flax LLaMA model to HF Transformers: https://github.com/huggingface/transformers/pull/24587 Just thought I'd let you...

ROPE: * Applied to the q/k/v states in the self-attention * Applied to the q states only in the cross-attention (not the k/v states) * The rationale is that the...

Hey @shashikg! Thanks for your awesome work on this repo - it's a very cool compilation of the various Whisper implementations 🙌 I'm working on the Hugging Face implementation, and...

Evaluating any Mistral checkpoint on MMLU throws an error, suggesting that there are several tokens tokens when only one is permitted: ``` accelerate launch --num_processes=1 run_evals_accelerate.py \ --model_args "pretrained=hf-internal-testing/tiny-random-MistralForCausalLM" \...

# What does this PR do? Supersedes https://github.com/huggingface/transformers/pull/28931 and extends it by adding static k/v cache support for Whisper. Also improves the performance of the eager attention implementation by removing...

# What does this PR do? Generation currently fails on `main` for mps devices: ```python from transformers.models.gemma2 import Gemma2ForCausalLM, Gemma2Config import torch config = Gemma2Config(num_hidden_layers=1, vocab_size=128, hidden_size=16, intermediate_size=32, num_attention_heads=1, num_key_value_heads=1)...