sherpa-onnx
sherpa-onnx copied to clipboard
TTS WebAssembly for other languages not work
I tried to follow the instruction to build text-to-speech with WebAssembly. https://k2-fsa.github.io/sherpa/onnx/tts/wasm/index.html When I used English language as in instruction. It worked well.
But when I tried to use some models for different languages. https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-de_DE-mls-medium.tar.bz2
It did not generate correct audio voices. (in the case of single speaker model, it seems to work, but for multiple speakers not work well)
How can I solve this problem?
Please tell us what you have done with the German tts model.
For "not work well", could you describe in detail what it means?
Thank you for reply. I just followed documentation page (https://k2-fsa.github.io/sherpa/onnx/tts/wasm/build.html) by changing URL for wget.
Page was successfully displayed, but when I tried to generate German voice from text "Heute ist ein guter Tag. Gestern war ein guter Tag.", it generate strange voices in all Speaker ID I tested (5~6 different ID).
when I used a single speaker model (I do not remember which one), Generated voice was no problem.
by changing URL for wget
Could you describe it in detail what you have done?
I tried wget and manually download from models list. wget -q https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-de_DE-mls-medium.tar.bz2
extract downloaded folder copy .onnx, tokens, and espeak-ng-data to asset folder and change .onnx file name to model.onnx.
delete old contents in build-wasm-simd-tts folder
run build-wasm-simd-tts.sh
test page
generated voice length is very long (~20 second) and strange from "Heute ist ein guter Tag. Gestern war ein guter Tag.".
Could you switch to another German model?
I just tested it and found that the model cannot produce correct speech. I am deleting it.
By the way, you can try all German tts models at https://huggingface.co/spaces/k2-fsa/text-to-speech
That is no problem. I am testing it. But I want to know why in English case multi-speakers model works, and not works in other languages (I tested French multi-speakers model as well, and it generates strange voices). Which files are wrong to produce strange voices?
I tested French multi-speakers model as well, and it generates strange voices
Please tell us the exact model you are using.
please first test the model at https://huggingface.co/spaces/k2-fsa/text-to-speech
I don't remember well, but I think model was https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-fr_FR-mls-medium.tar.bz2
It is possible that vits-piper models that contain "mls(-medium)" not work well in different languages as well.
I suggest that you don't use any model including mls in its name. I am deleting this model from sherpa-onnx.