openhathi_instruct icon indicating copy to clipboard operation
openhathi_instruct copied to clipboard

Tokenizer issue in Hathi Model

Open ONE-THING-9 opened this issue 2 years ago • 0 comments

AutoTokenizer and LlamaTokenizer (which Sarvam used) both behave differently with this model. AutoTokenizer sometimes splits words that are in vocab and LlamaTokenizer works fine.

https://huggingface.co/sarvamai/OpenHathi-7B-Hi-v0.1-Base/discussions/5

ONE-THING-9 avatar Dec 25 '23 12:12 ONE-THING-9