transformers
transformers copied to clipboard
Mixtral manual `head_dim`
Feature request
https://github.com/huggingface/transformers/blob/816f4424964c1a1631e303b663fc3d68f731e923/src/transformers/models/mixtral/modeling_mixtral.py#L284
head_dim in mixtral model is forced to have the value of hidden_size // num_heads. However, this it not the case in llama model or even in mistral model. So, it will be a good minor feature to support manual head_dim setting for mixtral model as well!
Motivation
- manual
head_dimin llama or mistral model
Your contribution
PR