OpenVoice real-time voice: 后续会支持实时语音转换吗？

trafficstars

我完整运行了你们的项目，为你们取得的成果表示由衷的祝贺。使用默认的TTS转换对于中文的支持不是太好，会有一些声调的问题，部分文字发音听起来像广西老表的口音一样😂。我使用真人的录音替换TTS，这会有很好的表现，即使是男声转女声也会有很不错的效果。如果后续能够支持实时语音转换，那么本项目的想象空间就会大很多。用于国内的泛娱乐主播市场，或者是游戏内语音交流场景，我相信会有很大的关注度，哪怕是会有秒级的延迟也可以接受。以上，祝本项目能够真正商业落地。

Translated using chatGPT： I have successfully executed your project in its entirety and extend my heartfelt congratulations on the achievements you've made. The default TTS conversion does not support Chinese very well; there are some tonal issues, and some words sound like they have a Guangxi accent 😂. I have replaced the TTS with real human recordings, which perform much better. Even when converting from male to female voices, the results are quite impressive. If there could be support for real-time voice conversion in the future, the potential for this project would significantly expand. I believe there would be considerable attention in the broader entertainment broadcasting market in China or in-game voice communication scenarios. Even if there were delays of a few seconds, it would still be acceptable. In conclusion, I hope this project can genuinely be implemented commercially.

Jan 05 '24 09:01 printlin

Hi - I plan to call for the community's developers to do this. Could you rewrite this issue in English so that more people can understand?

Jan 05 '24 15:01 Zengyi-Qin

我已经成功改造为了实时语音，利用pyaudio库从麦克风实时采集np.float32的音频数据，然后传入到convert方法中进行转换，将转换后的结果同样通过pyaudio库播放即可，实测效果很不错。我阅读了你们的论文，关键算法仅为数学计算，所以我这样直接改造为实时语音输入似乎从逻辑上是可行的。现在实测效果很棒，不知道是否会有其它问题。其中有几个关键点需要注意：

pyaudio录制时音频buff可以设置大一点（10000），这里通过引入延迟的方式来确保后续转换算法正常运行。
convert方法默认是使用librosa库从文件中读取，我们直接使用pyaudio采集的数据转为np.float32格式的数组传入即可。
pyaudio播放时需要将convert返回的audio转为bytes才能正确播放，否则是断续卡顿的音频。代码：audio.tobytes()。

Translated using chatGPT： I have successfully transformed it into real-time voice by utilizing the pyaudio library to capture audio data in np.float32 format from the microphone in real-time. Then, I pass this data into the 'convert' method for conversion, and the converted results are played back using the pyaudio library. The actual test results are quite impressive. After reading your paper, I found that the key algorithm mainly involves mathematical calculations, so it seemed logically feasible for me to directly transform it into real-time voice input. The real-time test results are excellent, but I'm not sure if there might be other issues. There are a few key points to note:

When recording with pyaudio, the audio buffer size can be set larger (e.g: 10,000). Here, I introduce delay to ensure that the subsequent conversion algorithm runs smoothly.
The 'convert' method is originally designed to read from files using the librosa library. Instead, we can directly pass the data captured by pyaudio in np.float32 format.
When playing with pyaudio, the audio returned by 'convert' needs to be converted to bytes to play correctly; otherwise, the audio will stutter. Code: audio.tobytes().

Jan 08 '24 08:01 printlin

相关代码可以参考我的博客：https://blog.csdn.net/Print_lin/article/details/135478576 You can refer to my blog for relevant codes：https://blog.csdn.net/Print_lin/article/details/135478576

Jan 09 '24 07:01 printlin

OpenVoice OpenVoice copied to clipboard

real-time voice: 后续会支持实时语音转换吗？

OpenVoice
OpenVoice copied to clipboard