The token id exceeds the size of tokenizer.vocab_size

Open zcharon opened this issue 1 year ago • 1 comments

tokenizer.vocab_size=12800, why does token id = 12800 appear? Shouldn't token id < tokenizer.vocab_size?

Jul 18 '24 13:07 zcharon

I'm not aware of such a constraint. Can you share more details on how this impacts your work?

Jul 31 '24 17:07 subramen