text-generation-inference Possible config mismatch between TGI and transformers (`hidden_act` vs. `hidden

Possible config mismatch between TGI and transformers (`hidden_act` vs. `hidden_activation`)

Open xenova opened this issue 1 year ago • 2 comments

System Info

n/a

Information

[ ] Docker
[ ] The CLI directly

Tasks

[ ] An officially supported command
[ ] My own modifications

Reproduction

I just wanted to raise a potential config mismatch between TGI and transformers: namely, hidden_act vs. hidden_activation for gemma2 (and most likely other models):

in Transformers (main), hidden_act is legacy and is overridden by hidden_activation (link)
in TGI (main), hidden_act is only used (link)

Expected behavior

hidden_activation should be supported in TGI, taking precedence over hidden_act if both are specified.

Jul 26 '24 11:07 xenova

Thank you @xenova 🫡

Should this be changed across all models or just gemma2?

Jul 29 '24 08:07 ErikKaum

I encountered this first for gemma2, but looking at the transformers source code (and seeing how new models are added), it looks like this is indeed a global change 👍 (maybe not for some legacy models)

Jul 29 '24 10:07 xenova

This is fixed for gemma models in this PR: https://github.com/huggingface/text-generation-inference/pull/2381

I'll still keep this open until the other models have been gone through as well.

Aug 09 '24 09:08 ErikKaum

I've checked now these:

Cohere: hidden_act
Deepseek: hidden_act
Gemma 1: hidden_act
Llama 3: hidden_act
Mistral: hidden_act
Mixtral: hidden_act
Neox: hidden_act
Phi : hidden_act
Qwen : hidden_act
StarCoder : hidden_act
Idefics 1 : hidden_act

So for now I think we should be good 👍 I'll close the issue, feel free to open if other ones come up!

Aug 09 '24 12:08 ErikKaum

text-generation-inference text-generation-inference copied to clipboard

Possible config mismatch between TGI and transformers (`hidden_act` vs. `hidden_activation`)

System Info

Information

Tasks

Reproduction

Expected behavior

text-generation-inference
text-generation-inference copied to clipboard