parler-tts icon indicating copy to clipboard operation
parler-tts copied to clipboard

Special token_ids in tokenizer

Open Happenmass opened this issue 1 year ago • 1 comments

I have noticed there are 100 additional_special_tokens in the "tokenizer_config.json" of the official repo in huggingface, but I did not find any other places these special tokens have been used, could you please share any information about them?

| "32000": {   | "content": "<extra_id_99>",   | "lstrip": false,   | "normalized": false,   | "rstrip": false,   | "single_word": false,   | "special": true   | },   | "32001": {   | "content": "<extra_id_98>",   | "lstrip": false,   | "normalized": false,   | "rstrip": false,   | "single_word": false,   | "special": true   | },   | "32002": {   | "content": "<extra_id_97>",   | "lstrip": false,   | "normalized": false,   | "rstrip": false,   | "single_word": false,   | "special": true   | },

Happenmass avatar Jul 02 '24 08:07 Happenmass

No particular usage of these tokens, but you could use them to add tokens when you train the model

ylacombe avatar Aug 01 '24 15:08 ylacombe