vall-e
vall-e copied to clipboard
Failed during inference [SyntaxError: well trained model shouldn't reach here.]
I get an error like this:
2023-10-19 10:10:09,510 INFO [infer.py:224] synthesize text: Selamat pagi
2023-10-19 10:10:09,513 WARNING [words_mismatch.py:88] words count mismatch on 500.0% of the lines (5/1)
2023-10-19 10:10:09,516 WARNING [words_mismatch.py:88] words count mismatch on 400.0% of the lines (4/1)
Traceback (most recent call last):
File "bin/infer.py", line 282, in <module>
main()
File "/media/de3fd1ee-a8c4-4153-9cf5-d642327ff6d0/TTS/valle/valle_env/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "bin/infer.py", line 251, in main
encoded_frames = model.inference(
File "/media/de3fd1ee-a8c4-4153-9cf5-d642327ff6d0/TTS/valle/vall-e/valle/models/valle.py", line 1050, in inference
raise SyntaxError(
SyntaxError: well trained model shouldn't reach here.
how to solve it? I have done AR and NAR training following the information here https://github.com/lifeiteng/vall-e#:~:text=LibriTTS%20demo%20Trained%20on%20one%20GPU%20with%2024G%20memory
It means that AR model could not predict EOS token which implies that it was not trained well. Do you know if this happens with other examples? Btw, does the loss curve of AR training seem ok?
It's the same with my problem. When I tested with a short prompt audio (3s or 4s), it was still good. However, the model didn't work or have a bad result. Could you guys help me to fix it?