llama3
llama3 copied to clipboard
The token id exceeds the size of tokenizer.vocab_size
tokenizer.vocab_size=12800, why does token id = 12800 appear? Shouldn't token id < tokenizer.vocab_size?
I'm not aware of such a constraint. Can you share more details on how this impacts your work?