llama3 icon indicating copy to clipboard operation
llama3 copied to clipboard

The token id exceeds the size of tokenizer.vocab_size

Open zcharon opened this issue 1 year ago • 1 comments

tokenizer.vocab_size=12800, why does token id = 12800 appear? Shouldn't token id < tokenizer.vocab_size? 1

2

zcharon avatar Jul 18 '24 13:07 zcharon

I'm not aware of such a constraint. Can you share more details on how this impacts your work?

subramen avatar Jul 31 '24 17:07 subramen