text-generation-inference
text-generation-inference copied to clipboard
Possible config mismatch between TGI and transformers (`hidden_act` vs. `hidden_activation`)
System Info
n/a
Information
- [ ] Docker
- [ ] The CLI directly
Tasks
- [ ] An officially supported command
- [ ] My own modifications
Reproduction
I just wanted to raise a potential config mismatch between TGI and transformers: namely, hidden_act vs. hidden_activation for gemma2 (and most likely other models):
- in Transformers (main),
hidden_actis legacy and is overridden byhidden_activation(link) - in TGI (main),
hidden_actis only used (link)
Expected behavior
hidden_activation should be supported in TGI, taking precedence over hidden_act if both are specified.
Thank you @xenova 🫡
Should this be changed across all models or just gemma2?
I encountered this first for gemma2, but looking at the transformers source code (and seeing how new models are added), it looks like this is indeed a global change 👍 (maybe not for some legacy models)
This is fixed for gemma models in this PR: https://github.com/huggingface/text-generation-inference/pull/2381
I'll still keep this open until the other models have been gone through as well.
I've checked now these:
- Cohere:
hidden_act - Deepseek:
hidden_act - Gemma 1:
hidden_act - Llama 3:
hidden_act - Mistral:
hidden_act - Mixtral:
hidden_act - Neox:
hidden_act - Phi :
hidden_act - Qwen :
hidden_act - StarCoder :
hidden_act - Idefics 1 :
hidden_act
So for now I think we should be good 👍 I'll close the issue, feel free to open if other ones come up!