MNN icon indicating copy to clipboard operation
MNN copied to clipboard

Kokoro-multilang-v1 model tts has broken voice.

Open geniusnut opened this issue 2 months ago • 0 comments

使用MNNConvert

../../build/MNNConvert -f ONNX --modelFile model.onnx --MNNModel model.mnn

转换模型。

./build/bin/sherpa-mnn-offline-tts \
  --kokoro-model=../../../sherpa-mnn-models/kokoro-multi-lang-v1_1/model.mnn \
  --kokoro-voices=../../../sherpa-mnn-models/kokoro-multi-lang-v1_1/voices.bin \
  --kokoro-tokens=../../../sherpa-mnn-models/kokoro-multi-lang-v1_1/tokens.txt \
  --kokoro-data-dir=../../../sherpa-mnn-models/kokoro-multi-lang-v1_1/espeak-ng-data \
  --kokoro-dict-dir=../../../sherpa-mnn-models/kokoro-multi-lang-v1_1/dict \
  --kokoro-lexicon=../../../sherpa-mnn-models/kokoro-multi-lang-v1_1/lexicon-us-en.txt,../../../sherpa-mnn-models/kokoro-multi-lang-v1_1/lexicon-zh.txt \
  --num-threads=2 \
  --sid=10 \
  --output-filename="./kokoro-11.wav" \
  "在AI圈,每次新的技术浪潮来袭,总能激起我们内心深处对未来的无限遐想。而就在最近,小米AI实验室的新一代Kaldi团队,悄然投下了一枚重磅炸弹——他们发布的ZipVoice系列语音合成(TTS)模型,不光是技术上的精进,更像是在这片领域吹响了一场“轻量化”革命的号角。"

生成成功wav。 Image 左边是用mnn模型,右边是onnx生成正常的wav。mnn生成的wav有些段落发音是错误的 17-20s错误的语音是这样的broken-17-20.wav 另外过程上看起来也略有差别,我是错过了什么吗?

geniusnut avatar Oct 30 '25 09:10 geniusnut