DeepSeek-V2 icon indicating copy to clipboard operation
DeepSeek-V2 copied to clipboard

Question about the design of bos and eos token

Open jojo23333 opened this issue 5 months ago • 0 comments

Hi, Thanks for the great work. I'm just in general curious about whether there is a reason to use the Chinese version of '|' and '▁'instead of the '|' , ‘_’ which is standard ASCII characters in eos_token and bos_token. ('<|end▁of▁sentence|>' and '<|begin▁of▁sentence|>' ). Is this for distinguishing deep seek model from English only LLM's like Llamma?

image

jojo23333 avatar Aug 28 '24 00:08 jojo23333