sherpa-onnx icon indicating copy to clipboard operation
sherpa-onnx copied to clipboard

Generating speach in Russian with C# returns nonsence

Open Onkitova opened this issue 9 months ago • 6 comments

Hello and thank you so much for this great repo!

Unfortunately, I came across a very strange bug. When using the standalone application "sherpa-onnx-non-streaming-tts-x64-v1.9.23.exe" from the release -- everything is ok. But as soon as I try to generate with the same model, token file and espeak-ng-data using C# -- the output is gibberish (sounds like when you try to feed non-English language to English TTS model). No errors or debug messages. File is successfully generated, but output speech sounds nonsensical. This bug is reproducible even in the provided example project "dotnet-examples/offline-tts".

Onkitova avatar May 05 '24 21:05 Onkitova

Could you describe which model you are using?

csukuangfj avatar May 05 '24 23:05 csukuangfj

Could you describe which model you are using?

Of course! Lets make it clear: I am able to reproduce it with every of 4 RU-lang models. However, for the sake of analysis, lets say it is:

  • vits-piper-ru_RU-irina-medium.tar.bz2

Here is how it sounds with sherpa-onnx-non-streaming-tts-x64-v1.9.23.exe.

And here is what I got with c# (tried running both ways from .bat AND directly from code). Screenshot 2024-05-06 052557 It is also absurdly longer (x6 times) than it should be in comparison to the right version.

One more thing that I can tell you is that vocal output is still not totally random. I can definitely hear every of 4 models speak the same thing, just not in Russian. Looks like, as if it lacks some pointer to specific language (in order to utilize related espeak-ng-data) with c# sample, while sherpa-onnx-non-streaming-tts-x64-v1.9.23.exe somehow manages to get it right by itself.

Onkitova avatar May 06 '24 02:05 Onkitova

Sorry, I cannot access your link.


I just tested it locally on my macOS and it works perfectly.

dotnet run \
  --vits-model=./vits-piper-ru_RU-irina-medium/ru_RU-irina-medium.onnx \
  --vits-tokens=./vits-piper-ru_RU-irina-medium/tokens.txt \
  --vits-data-dir=./vits-piper-ru_RU-irina-medium/espeak-ng-data \
  --debug=1 \
  --output-filename=./hi.wav \
  --text="Как твои дела?"

It produces hi.wav.txt

(Please rename it to hi.wav.)

Please make sure you use utf-8 encoding for your computer.

Please see the doc https://k2-fsa.github.io/sherpa/onnx/tts/faq.html#how-to-enable-utf-8-on-windows

Sorry that this specific part is in Chinese. Many Chinese users have issues using the Chinese TTS models before making the changes to their computers to use UTF-8 encoding.

csukuangfj avatar May 06 '24 02:05 csukuangfj

Sorry, I cannot access your link.

I just tested it locally on my macOS and it works perfectly.

dotnet run \
  --vits-model=./vits-piper-ru_RU-irina-medium/ru_RU-irina-medium.onnx \
  --vits-tokens=./vits-piper-ru_RU-irina-medium/tokens.txt \
  --vits-data-dir=./vits-piper-ru_RU-irina-medium/espeak-ng-data \
  --debug=1 \
  --output-filename=./hi.wav \
  --text="Как твои дела?"

It produces hi.wav.txt

(Please rename it to hi.wav.)

Please make sure you use utf-8 encoding for your computer.

Please see the doc https://k2-fsa.github.io/sherpa/onnx/tts/faq.html#how-to-enable-utf-8-on-windows

Sorry that this specific part is in Chinese. Many Chinese users have issues using the Chinese TTS models before making the changes to their computers to use UTF-8 encoding.

Wow, enforcing utf-8 actually helped! Thank you!

Here is the final question then: maybe you got an idea as if there is some way to do the same, but without touching windows settings? I mean, somehow enforce utf-8 encoding from code, while passing such options.text param to sherpa? Or maybe I can somehow pass not string, but path to text file containing text to be spoken, while ensuring this specific file is utf-8 encoded? I am asking, because sherpa-onnx-non-streaming-tts-x64-v1.9.23.exe managed to do right without any windows setting manipulation.

Onkitova avatar May 06 '24 03:05 Onkitova

The C++ code expects string in UTF-8 encoding.

The code works fine on my macOS without any system changes. I am unsure why it causes issues on your and some other users' systems.

C# uses UTF-16 encoded strings.

The following line https://github.com/k2-fsa/sherpa-onnx/blob/4f758e6cd34c83837a8e269a52edf5b5b4143d2d/scripts/dotnet/offline.cs#L235 does the conversion automagically from UTF-16 to UTF-8.


but without touching windows settings

Sorry, I've no idea about how to fix it by changing the code (if it is indeed caused by code).


You can try reading the text from a utf-8 encoded file and see if it works.

csukuangfj avatar May 06 '24 03:05 csukuangfj

The code works fine on my macOS without any system changes. I am unsure why it causes issues on your and some other users' systems.

I also can suppress this issue for myself if enforce UTF-8 (codepage 65001) system-wide (following the instruction from the link you provided). Thanks for that, once again. But I want to also embed sherpa-onnx into my software to be shared with other people and it would be incredibly awkward to ask every potential user to "go here and click that then reload" or "I played a little bit with your registry so now you need to reload and then application can finally work as intended". That's why I keep looking for solution.

C# uses UTF-16 encoded strings. The following line https://github.com/k2-fsa/sherpa-onnx/blob/4f758e6cd34c83837a8e269a52edf5b5b4143d2d/scripts/dotnet/offline.cs#L235 does the conversion automagically from UTF-16 to UTF-8.

As I understand it, the problem is not in the C++ code, but on the side of C#, which due to the nature of UTF-16 strings incorrectly translates Cyrillic characters (and possibly Chinese characters too) when the system is not set to force everything to work on UTF-8 (codepage 65001).

You can try reading the text from a utf-8 encoded file and see if it works.

I tried a lot, with files, encoding focuses and so on. Unfortunately, no remedy found here.

Could you please consider adding another variant of SherpaOnnxOfflineTtsGenerate, for example SherpaOnnxOfflineTtsGenerateFromFile, which instead of literal text to voice will expect a path to a text file from string argument, from which C++ will extract the text to be voiced? That way, I think we can get around this problem by simply using a text file with fixed utf-8 encoding as a proxy.

Onkitova avatar May 06 '24 06:05 Onkitova