gemma_pytorch
gemma_pytorch copied to clipboard
Are there reserved/unused tokens for developers?
Due to BPE vocabulary unable to dynamically expand after training, for finetuning, some BPE tokenizer based models such as Qwen reserved 2k extra unused tokens at the end for developers to use as they see fit.
Does Gemma have a list of internally unused tokens?
Sometimes model makers resize a vocab to a nice gpu-friendly multiple which creates unused tokens or intentially leave some unused tokens such as Qwen.