h2o-llmstudio [BUG] Tokenizer config has add_bos_token=true while LLM Studio is training with add_special

[BUG] Tokenizer config has add_bos_token=true while LLM Studio is training with add_special_tokens=False

Open pascal-pfeiffer opened this issue 1 year ago • 1 comments

trafficstars

🐛 Bug

The generated tokenizer_config.json has add_bos_token=true while H2O LLM Studio is training with add_special_tokens=False. Using the default AutoTokenizer, this leads to different behaviors.

We should be explicit/correct about it and set add_bos_token=false

To Reproduce

Fine tune a model and download / push to HF

LLM Studio version

<=1.4.1, b70b04f68d16ae73524d7f38f45e571ddb92cfc3

Mar 21 '24 07:03 pascal-pfeiffer

add_eos_token=false as well

Mar 21 '24 07:03 psinger

h2o-llmstudio h2o-llmstudio copied to clipboard

[BUG] Tokenizer config has add_bos_token=true while LLM Studio is training with add_special_tokens=False

🐛 Bug

To Reproduce

LLM Studio version

h2o-llmstudio
h2o-llmstudio copied to clipboard