sherpa-onnx icon indicating copy to clipboard operation
sherpa-onnx copied to clipboard

TTS WebAssembly for other languages not work

Open kmpartner opened this issue 1 year ago • 10 comments

I tried to follow the instruction to build text-to-speech with WebAssembly. https://k2-fsa.github.io/sherpa/onnx/tts/wasm/index.html When I used English language as in instruction. It worked well.

But when I tried to use some models for different languages. https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-de_DE-mls-medium.tar.bz2

It did not generate correct audio voices. (in the case of single speaker model, it seems to work, but for multiple speakers not work well)

How can I solve this problem?

kmpartner avatar May 08 '24 23:05 kmpartner

Please tell us what you have done with the German tts model.

For "not work well", could you describe in detail what it means?

csukuangfj avatar May 09 '24 01:05 csukuangfj

Thank you for reply. I just followed documentation page (https://k2-fsa.github.io/sherpa/onnx/tts/wasm/build.html) by changing URL for wget.

Page was successfully displayed, but when I tried to generate German voice from text "Heute ist ein guter Tag. Gestern war ein guter Tag.", it generate strange voices in all Speaker ID I tested (5~6 different ID).

when I used a single speaker model (I do not remember which one), Generated voice was no problem.

kmpartner avatar May 11 '24 12:05 kmpartner

by changing URL for wget

Could you describe it in detail what you have done?

csukuangfj avatar May 11 '24 14:05 csukuangfj

I tried wget and manually download from models list. wget -q https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-de_DE-mls-medium.tar.bz2

extract downloaded folder copy .onnx, tokens, and espeak-ng-data to asset folder and change .onnx file name to model.onnx.

delete old contents in build-wasm-simd-tts folder

run build-wasm-simd-tts.sh

test page

generated voice length is very long (~20 second) and strange from "Heute ist ein guter Tag. Gestern war ein guter Tag.".

kmpartner avatar May 12 '24 00:05 kmpartner

Could you switch to another German model?

I just tested it and found that the model cannot produce correct speech. I am deleting it.

csukuangfj avatar May 12 '24 03:05 csukuangfj

By the way, you can try all German tts models at https://huggingface.co/spaces/k2-fsa/text-to-speech

Screenshot 2024-05-12 at 12 06 57

csukuangfj avatar May 12 '24 04:05 csukuangfj

That is no problem. I am testing it. But I want to know why in English case multi-speakers model works, and not works in other languages (I tested French multi-speakers model as well, and it generates strange voices). Which files are wrong to produce strange voices?

kmpartner avatar May 12 '24 04:05 kmpartner

I tested French multi-speakers model as well, and it generates strange voices

Please tell us the exact model you are using.

please first test the model at https://huggingface.co/spaces/k2-fsa/text-to-speech

csukuangfj avatar May 12 '24 06:05 csukuangfj

I don't remember well, but I think model was https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-fr_FR-mls-medium.tar.bz2

It is possible that vits-piper models that contain "mls(-medium)" not work well in different languages as well.

kmpartner avatar May 13 '24 00:05 kmpartner

I suggest that you don't use any model including mls in its name. I am deleting this model from sherpa-onnx.

csukuangfj avatar May 13 '24 01:05 csukuangfj