EmotiVoice icon indicating copy to clipboard operation
EmotiVoice copied to clipboard

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Results 113 EmotiVoice issues
Sort by recently updated
recently updated
newest added

我已经部署起来了,能访问了,如图 ![image](https://github.com/netease-youdao/EmotiVoice/assets/48005813/bdb83244-9e03-47c0-b199-23bae389e0c4) 但是,当我去curlcurl -X GET "http://192.168.1.228:8501/v1/docs" -H "Content-Type: application/json" -H "Accept: application/json" curl -X GET "http://192.168.1.228:8501/api/doc" -H "Content-Type: application/json" -H "Accept: application/json" curl -X GET "http://192.168.1.228:8501/v1/audio/speech" -H "Content-Type: application/json" -H...

# MFA Step6 已执行 ``` mfa validate \ --overwrite \ --clean \ --single_speaker \ data/DataBaker/mfa/lab \ data/DataBaker/mfa/mfa_pronounciation_dict.txt ``` 执行报错 ``` mfa train \ --overwrite \ --clean \ --single_speaker \ data/DataBaker/mfa/lab...

查看文档中的[voicelist](https://github.com/netease-youdao/EmotiVoice/wiki/%F0%9F%98%8A-voice-wiki-page),好像角色都是外国人。虽然也能用来读中文文本,但是还是听得出带有外国口音。不过,我看介绍收费api的文档提到是有中文角色的,比如:shudingli、youxiaofu等等。请问,在哪能找到原生的中文角色id列表?或者给出几个中文母语角色的数字ID,谢谢。

用的是api生成的语音片段。 并不是每个生成的语音片段都有这样的啪嗒的声音,但是有不少语音片段头部,有啪嗒的一声,或者哒的一声,就像电流啪嗒一样的声音,这是什么原因?你们有这样吗?

在重点关注 模型一直识别为chong 2 dian 3 应该为 zhong 4 dian 3

Hi there, I got a lot of telephone emotion recordings and we want to finetune the model in order to add more voices and emotions. Is there any good way...

In demo page when use GPU synthesized long text,the memory not released 在demo页面使用GPU转换较长的文本的时候显存一直占用不释放。

Can I finetune the model with dialect audios?e.g. Cantonese, Minnan.

Hello: I met a confusing issue. text1:他们两人的很普通并看似幼稚的对话,很好地表现了这一点。而后又通过看新娘,看龙舟等事,告诉我们两个人虽然天真单纯,却并不单一。于是,人物不是扁平如纸,而是有了充实的血肉。 phonelist1: ![image](https://github.com/netease-youdao/EmotiVoice/assets/32287808/e2a00ac2-9a5d-4d55-8a0d-863190267c8e) text2:而后又通过看新娘,看龙舟等事,告诉我们两个人虽然天真单纯,却并不单一。于是,人物不是扁平如纸,而是有了充实的血肉。 phonelist2: ![image](https://github.com/netease-youdao/EmotiVoice/assets/32287808/235e273d-f8cf-4426-8955-b3c2b3b0514e) It is so fast for 'wav1' when it reach "于是",while it‘s normal for 'wav2' and the phonelists are...