num-start and num-end
Thanks for the fantastic work you present!
https://huggingface.co/GSAI-ML/LLaDA-8B-Instruct/blob/main/special_tokens_map.json
In the tokenizer, we have num-start and num-end, two special tokens. I want to ask whether the two tokens have been used in pre-training so the model can handle numbers specifically.
No, these two special tokens are not used during pre-training, so they do not have any effect.
thanks for your reply. Is there a suggested special token for us to handle numbers specifically? Are <|arithmetic_start|> and <|arithmetic_end|> used in the pre-training? How are "role" tokens used in pre-training?
thanks for your reply. Is there a suggested special token for us to handle numbers specifically? Are <|arithmetic_start|> and <|arithmetic_end|> used in the pre-training? How are "role" tokens used in pre-training?
Other special tokens for handling numbers, such as <|arithmetic_start|> and <|arithmetic_end|>, were also not used during pre-training. The "role" tokens in the tokenizer were not used in pre-training either.