Special token_ids in tokenizer
I have noticed there are 100 additional_special_tokens in the "tokenizer_config.json" of the official repo in huggingface, but I did not find any other places these special tokens have been used, could you please share any information about them?
| "32000": { | "content": "<extra_id_99>", | "lstrip": false, | "normalized": false, | "rstrip": false, | "single_word": false, | "special": true | }, | "32001": { | "content": "<extra_id_98>", | "lstrip": false, | "normalized": false, | "rstrip": false, | "single_word": false, | "special": true | }, | "32002": { | "content": "<extra_id_97>", | "lstrip": false, | "normalized": false, | "rstrip": false, | "single_word": false, | "special": true | },
No particular usage of these tokens, but you could use them to add tokens when you train the model