Amphion icon indicating copy to clipboard operation
Amphion copied to clipboard

RuntimeError: failed to load voice "ja"

Open zachysaur opened this issue 1 year ago • 14 comments

(venv) F:\maskgct\maskgct>python app.py ./models/tts/maskgct/g2p\sources\g2p_chinese_model\poly_bert_model.onnx Error: Could not load the specified mbrola voice file. Error: Could not load the specified mbrola voice file. Traceback (most recent call last): File "F:\maskgct\maskgct\app.py", line 20, in from models.tts.maskgct.g2p.g2p_generation import g2p, chn_eng_g2p File "F:\maskgct\maskgct\models\tts\maskgct\g2p\g2p_generation.py", line 10, in from models.tts.maskgct.g2p.utils.g2p import phonemizer_g2p File "F:\maskgct\maskgct\models\tts\maskgct\g2p\utils\g2p.py", line 30, in phonemizer_ja = EspeakBackend( File "F:\maskgct\maskgct\venv\lib\site-packages\phonemizer\backend\espeak\espeak.py", line 49, in init self._espeak.set_voice(language) File "F:\maskgct\maskgct\venv\lib\site-packages\phonemizer\backend\espeak\wrapper.py", line 249, in set_voice raise RuntimeError( # pragma: nocover RuntimeError: failed to load voice "ja"

(venv) F:\maskgct\maskgct>

zachysaur avatar Nov 01 '24 11:11 zachysaur

Hi, the MaskGCT is built in a Linux environment. For a better coding experience, it is recommended that Linux be used to reproduce.

For people who are having problems configuring the env on a Windows machine, you can try to follow this blog post: https://www.cnblogs.com/v3ucn/p/18511187

yuantuo666 avatar Nov 02 '24 04:11 yuantuo666

@zachysaur I had the same issue on Windows. problem solved by: replacing phonemizer files from this fixed commit https://github.com/bootphon/phonemizer/tree/b2db56adceef42b9a20c8ffb4d49868f630b88a1/phonemizer

After that if you got character unicode error, just turn on UTF-8 (BETA) language for non-unicode programs in regional and language settings SNAG-0000

If you get mbrola dlls error, put those two files from zip to: mbrola.zip

C:\Program Files (x86)\eSpeak\command_line

It should now work.

zelenooki87 avatar Nov 02 '24 16:11 zelenooki87

Hi, the MaskGCT is built in a Linux environment. For a better coding experience, it is recommended that Linux be used to reproduce.

For people who are having problems configuring the env on a Windows machine, you can try to follow this blog post: https://www.cnblogs.com/v3ucn/p/18511187

still same error even after following everything on this blog ./models/tts/maskgct/g2p\sources\g2p_chinese_model\poly_bert_model.onnx 2024-11-03 08:38:00.9068680 [E:onnxruntime:Default, provider_bridge_ort.cc:1862 onnxruntime::TryGetProviderInfo_CUDA] D:\a_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1539 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "F:\gct\Amphion\venv\Lib\site-packages\onnxruntime\capi\onnxruntime_providers_cuda.dll"

2024-11-03 08:38:00.9208389 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:993 onnxruntime::python::CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Require cuDNN 9.* and CUDA 12.*, and the latest MSVC runtime. Please install all dependencies as mentioned in the GPU requirements page (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements), make sure they're in the PATH, and that your GPU is supported. Error: Could not load the specified mbrola voice file. Error: Could not load the specified mbrola voice file. Traceback (most recent call last): File "F:\gct\Amphion\1.py", line 1, in from models.tts.maskgct.maskgct_utils import * ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "F:\gct\Amphion\models\tts\maskgct\maskgct_utils.py", line 20, in from models.tts.maskgct.g2p.g2p_generation import g2p, chn_eng_g2p File "F:\gct\Amphion\models\tts\maskgct\g2p\g2p_generation.py", line 10, in from models.tts.maskgct.g2p.utils.g2p import phonemizer_g2p File "F:\gct\Amphion\models\tts\maskgct\g2p\utils\g2p.py", line 30, in phonemizer_ja = EspeakBackend( ^^^^^^^^^^^^^^ File "F:\gct\Amphion\venv\Lib\site-packages\phonemizer\backend\espeak\espeak.py", line 49, in init self._espeak.set_voice(language) File "F:\gct\Amphion\venv\Lib\site-packages\phonemizer\backend\espeak\wrapper.py", line 249, in set_voice raise RuntimeError( # pragma: nocover RuntimeError: failed to load voice "ja"

zachysaur avatar Nov 03 '24 03:11 zachysaur

Hi, the MaskGCT is built in a Linux environment. For a better coding experience, it is recommended that Linux be used to reproduce. For people who are having problems configuring the env on a Windows machine, you can try to follow this blog post: https://www.cnblogs.com/v3ucn/p/18511187

still same error even after following everything on this blog ./models/tts/maskgct/g2p\sources\g2p_chinese_model\poly_bert_model.onnx 2024-11-03 08:38:00.9068680 [E:onnxruntime:Default, provider_bridge_ort.cc:1862 onnxruntime::TryGetProviderInfo_CUDA] D:\a_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1539 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "F:\gct\Amphion\venv\Lib\site-packages\onnxruntime\capi\onnxruntime_providers_cuda.dll"

2024-11-03 08:38:00.9208389 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:993 onnxruntime::python::CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Require cuDNN 9.* and CUDA 12.*, and the latest MSVC runtime. Please install all dependencies as mentioned in the GPU requirements page (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements), make sure they're in the PATH, and that your GPU is supported. Error: Could not load the specified mbrola voice file. Error: Could not load the specified mbrola voice file. Traceback (most recent call last): File "F:\gct\Amphion\1.py", line 1, in from models.tts.maskgct.maskgct_utils import * ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "F:\gct\Amphion\models\tts\maskgct\maskgct_utils.py", line 20, in from models.tts.maskgct.g2p.g2p_generation import g2p, chn_eng_g2p File "F:\gct\Amphion\models\tts\maskgct\g2p\g2p_generation.py", line 10, in from models.tts.maskgct.g2p.utils.g2p import phonemizer_g2p File "F:\gct\Amphion\models\tts\maskgct\g2p\utils\g2p.py", line 30, in phonemizer_ja = EspeakBackend( ^^^^^^^^^^^^^^ File "F:\gct\Amphion\venv\Lib\site-packages\phonemizer\backend\espeak\espeak.py", line 49, in init self._espeak.set_voice(language) File "F:\gct\Amphion\venv\Lib\site-packages\phonemizer\backend\espeak\wrapper.py", line 249, in set_voice raise RuntimeError( # pragma: nocover RuntimeError: failed to load voice "ja"

You could try this repo. It worked for me correctly on Windows. https://github.com/justinjohn0306/MaskGCT-Windows Error message says you dont have Cuda 12.x and Cudnn (+zlib.dll) and msvc build tools in path. If so, you can install onnxruntime (by default). Or if you install all cuda dependencies properly and place it to path variable you could install onnxruntime-gpu for faster inference. Uninstall pytorch and reinstall for GPU. Thats all

zelenooki87 avatar Nov 03 '24 09:11 zelenooki87

You could try this repo. It worked for me correctly on Windows. https://github.com/justinjohn0306/MaskGCT-Windows Error message says you dont have Cuda 12.x and Cudnn (+zlib.dll) and msvc build tools in path. If so, you can install onnxruntime (by default). Or if you install all cuda dependencies properly and place it to path variable you could install onnxruntime-gpu for faster inference. Uninstall pytorch and reinstall for GPU. Thats all

How is the quality of your local generations? Did you try to recreate the demo page examples? Mine outputs lower quality than those ones. For instance, the whisper voice comes out as something between a whisper and a low voice: https://github.com/open-mmlab/Amphion/issues/334

GalenMarek14 avatar Nov 07 '24 01:11 GalenMarek14

You could try this repo. It worked for me correctly on Windows. https://github.com/justinjohn0306/MaskGCT-Windows Error message says you dont have Cuda 12.x and Cudnn (+zlib.dll) and msvc build tools in path. If so, you can install onnxruntime (by default). Or if you install all cuda dependencies properly and place it to path variable you could install onnxruntime-gpu for faster inference. Uninstall pytorch and reinstall for GPU. Thats all

How is the quality of your local generations? Did you try to recreate the demo page examples? Mine outputs lower quality than those ones. For instance, the whisper voice comes out as something between a whisper and a low voice: #334

i got everything to pth i have used every possible tool made your repo got some error you need to fix them

zachysaur avatar Nov 07 '24 01:11 zachysaur

You could try this repo. It worked for me correctly on Windows. https://github.com/justinjohn0306/MaskGCT-Windows Error message says you dont have Cuda 12.x and Cudnn (+zlib.dll) and msvc build tools in path. If so, you can install onnxruntime (by default). Or if you install all cuda dependencies properly and place it to path variable you could install onnxruntime-gpu for faster inference. Uninstall pytorch and reinstall for GPU. Thats all

How is the quality of your local generations? Did you try to recreate the demo page examples? Mine outputs lower quality than those ones. For instance, the whisper voice comes out as something between a whisper and a low voice: #334

i got everything to pth i have used every possible tool made your repo got some error you need to fix them

I am just an another user. I was asking for their experience as I am also getting lower quality results, though I somehow made it work by zelenooki87 method

GalenMarek14 avatar Nov 07 '24 02:11 GalenMarek14

You could try this repo. It worked for me correctly on Windows. https://github.com/justinjohn0306/MaskGCT-Windows Error message says you dont have Cuda 12.x and Cudnn (+zlib.dll) and msvc build tools in path. If so, you can install onnxruntime (by default). Or if you install all cuda dependencies properly and place it to path variable you could install onnxruntime-gpu for faster inference. Uninstall pytorch and reinstall for GPU. Thats all

How is the quality of your local generations? Did you try to recreate the demo page examples? Mine outputs lower quality than those ones. For instance, the whisper voice comes out as something between a whisper and a low voice: #334

i got everything to pth i have used every possible tool made your repo got some error you need to fix them

I am just an another user. I was asking for their experience as I am also getting lower quality results, though I somehow made it work by zelenooki87 method

with your experience you telling someone to mess up his whole setting ad remove everything

https://www.youtube.com/@socialapps1194 my youtube channel

zachysaur avatar Nov 07 '24 03:11 zachysaur

You could try this repo. It worked for me correctly on Windows. https://github.com/justinjohn0306/MaskGCT-Windows Error message says you dont have Cuda 12.x and Cudnn (+zlib.dll) and msvc build tools in path. If so, you can install onnxruntime (by default). Or if you install all cuda dependencies properly and place it to path variable you could install onnxruntime-gpu for faster inference. Uninstall pytorch and reinstall for GPU. Thats all

How is the quality of your local generations? Did you try to recreate the demo page examples? Mine outputs lower quality than those ones. For instance, the whisper voice comes out as something between a whisper and a low voice: #334

i got everything to pth i have used every possible tool made your repo got some error you need to fix them

I am just an another user. I was asking for their experience as I am also getting lower quality results, though I somehow made it work by zelenooki87 method

with your experience you telling someone to mess up his whole setting ad remove everything

https://www.youtube.com/@socialapps1194 my youtube channel

...Dude, were you sleepy when you were reading my comments? When did I ask you to do anything? zelenooki87 shared his method for working with this on Win 11, and I got it working, but my version outputs somehow wonky results, so I was asking for his observations. I wasn't asking you to do anything; I was just asking him.

I don't know if the models are different or if this method messes something up, but I couldn't reproduce the demo page examples with the same quality.

Also, FYI, you don't need to modify or remove anything. You can just try this in another environment in a separate folder if you're curious. Not that I'm asking you to, though...

GalenMarek14 avatar Nov 07 '24 14:11 GalenMarek14

The problem happened because misconstruction of the espeak-ng file structure. If you run --voices you can see except 'ja' (stand for japanese), each other languages is one normal version before and then mutiple mbrola version. So you can find the build-in function in espeak and modify it.

Image can change like that in 240 line. Image

Ezru1 avatar Feb 10 '25 00:02 Ezru1

The problem happened because misconstruction of the espeak-ng file structure. If you run --voices you can see except 'ja' (stand for japanese), each other languages is one normal version before and then mutiple mbrola version. So you can find the build-in function in espeak and modify it.

Image can change like that in 240 line. Image

你是神!

tailuchen avatar Feb 10 '25 10:02 tailuchen

The problem happened because misconstruction of the espeak-ng file structure. If you run --voices you can see except 'ja' (stand for japanese), each other languages is one normal version before and then mutiple mbrola version. So you can find the build-in function in espeak and modify it.

Image can change like that in 240 line. Image

can you tell me what i'm supposed to run to get to the second pic?

garimitsu avatar Mar 01 '25 18:03 garimitsu

just find the library doc in your environment. Image

Ezru1 avatar Mar 01 '25 18:03 Ezru1

If you can't find the location on your local C: drive, search for 'wrapper.' Run it with Visual Studio, change line 240 with the instruction provided by @Ezru1 , and the problem will be solved!

You can Use Japanese voices!

Alesdanz avatar Mar 05 '25 02:03 Alesdanz