FireRedTeam comments

Results 16 comments of


                                            FireRedTeam

About Semantic-Aware Speech Tokenizer

> Hi, Great Job! > > I want to finetune the model unsing my own data. However, I don't find the Semantic-Aware Speech Tokenizer in your open-source codes. Do you...

吞音鬼畜

1. 因为训练语料的原因，某些符号可能不在我们的处理范围内，您可以手动处理一下。 2. 我们会马上放出一个新模型来提升稳定性问题。

吞音鬼畜

> 期待，还有一个问题就是句子末尾会缺字新模型昨天已经更新到huggingface，使用最新模型即可

吞音鬼畜

> 还是会吞音这是超长句的问题（因为你的文本字数超过了模型处理的最大长度），我们会提供一个切句逻辑集成，把长句切成短句之后再进行合成。

吞音鬼畜

> 另外的问题就是生成速度慢，还有不稳定的错误这个也是因为你输入了过长的句子导致的，接下来一并解决。

吞音鬼畜

> 最后就是建议多语言混合，分开确实不太好多语言直接使用zh进行标签，不过你的建议收到，我们会逐步尝试把语言标签去掉。

吞音鬼畜

模型和代码已经更新

Training and Inference Code

> Coming from arxiv website. This paper is super cool imo. Would love to train this model for my use case. Are you planning to release the training and Inference...

Training and Inference Code

> have a dev group ? @deyituo we are considering that, please wait for the update. For now, if you meet some problem, please open an issue to discuss the...

请问有各种方言的支持资料吗？

Kespeech测试集的8种普通话子方言可以作为参考。我们在Kespeech上进行了测试，结果可以参考论文或者自测一下。