text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

Possible config mismatch between TGI and transformers (`hidden_act` vs. `hidden_activation`)

Open xenova opened this issue 1 year ago • 2 comments

System Info

n/a

Information

  • [ ] Docker
  • [ ] The CLI directly

Tasks

  • [ ] An officially supported command
  • [ ] My own modifications

Reproduction

I just wanted to raise a potential config mismatch between TGI and transformers: namely, hidden_act vs. hidden_activation for gemma2 (and most likely other models):

  • in Transformers (main), hidden_act is legacy and is overridden by hidden_activation (link)
  • in TGI (main), hidden_act is only used (link)

Expected behavior

hidden_activation should be supported in TGI, taking precedence over hidden_act if both are specified.

xenova avatar Jul 26 '24 11:07 xenova

Thank you @xenova 🫡

Should this be changed across all models or just gemma2?

ErikKaum avatar Jul 29 '24 08:07 ErikKaum

I encountered this first for gemma2, but looking at the transformers source code (and seeing how new models are added), it looks like this is indeed a global change 👍 (maybe not for some legacy models)

xenova avatar Jul 29 '24 10:07 xenova

This is fixed for gemma models in this PR: https://github.com/huggingface/text-generation-inference/pull/2381

I'll still keep this open until the other models have been gone through as well.

ErikKaum avatar Aug 09 '24 09:08 ErikKaum

I've checked now these:

  • Cohere: hidden_act
  • Deepseek: hidden_act
  • Gemma 1: hidden_act
  • Llama 3: hidden_act
  • Mistral: hidden_act
  • Mixtral: hidden_act
  • Neox: hidden_act
  • Phi : hidden_act
  • Qwen : hidden_act
  • StarCoder : hidden_act
  • Idefics 1 : hidden_act

So for now I think we should be good 👍 I'll close the issue, feel free to open if other ones come up!

ErikKaum avatar Aug 09 '24 12:08 ErikKaum