h2o-llmstudio
h2o-llmstudio copied to clipboard
[BUG] Tokenizer config has add_bos_token=true while LLM Studio is training with add_special_tokens=False
trafficstars
🐛 Bug
The generated tokenizer_config.json has add_bos_token=true while H2O LLM Studio is training with add_special_tokens=False.
Using the default AutoTokenizer, this leads to different behaviors.
We should be explicit/correct about it and set add_bos_token=false
To Reproduce
Fine tune a model and download / push to HF
LLM Studio version
<=1.4.1, b70b04f68d16ae73524d7f38f45e571ddb92cfc3
add_eos_token=false as well