CosyVoice icon indicating copy to clipboard operation
CosyVoice copied to clipboard

日语语言标签为 <|ja|>,而非 <|jp|>,与example中的注释不符

Open zengzengqwq opened this issue 1 month ago • 0 comments

Describe the bug 文档/示例对“日语”语言标签的写法与代码实际支持不一致

https://github.com/FunAudioLLM/CosyVoice/blob/main/example.py#L21 注释写的是:<|zh|><|en|><|jp|><|yue|><|ko|>

https://github.com/FunAudioLLM/CosyVoice/blob/main/cosyvoice/tokenizer/tokenizer.py#L19 但实际 tokenizer 注册的语言 token 来自 LANGUAGES 的 key,日语是 ja

https://github.com/FunAudioLLM/CosyVoice/blob/main/cosyvoice/tokenizer/tokenizer.py#L182 并且 get_encoding() 注册的是 *[f"<|{lang}|>" for lang in list(LANGUAGES.keys())],因此支持的是 <|ja|> 而不是 <|jp|>。

这会导致使用 <|jp|> 时,tokenizer 不会把它当成一个特殊 token,例如出现如下issue的问题 https://github.com/FunAudioLLM/CosyVoice/issues/621

zengzengqwq avatar Dec 16 '25 04:12 zengzengqwq