vall-e Failed during inference [SyntaxError: well trained model shouldn't reach here.]

I get an error like this:

2023-10-19 10:10:09,510 INFO [infer.py:224] synthesize text: Selamat pagi
2023-10-19 10:10:09,513 WARNING [words_mismatch.py:88] words count mismatch on 500.0% of the lines (5/1)
2023-10-19 10:10:09,516 WARNING [words_mismatch.py:88] words count mismatch on 400.0% of the lines (4/1)
Traceback (most recent call last):
  File "bin/infer.py", line 282, in <module>
    main()
  File "/media/de3fd1ee-a8c4-4153-9cf5-d642327ff6d0/TTS/valle/valle_env/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "bin/infer.py", line 251, in main
    encoded_frames = model.inference(
  File "/media/de3fd1ee-a8c4-4153-9cf5-d642327ff6d0/TTS/valle/vall-e/valle/models/valle.py", line 1050, in inference
    raise SyntaxError(
SyntaxError: well trained model shouldn't reach here.

how to solve it? I have done AR and NAR training following the information here https://github.com/lifeiteng/vall-e#:~:text=LibriTTS%20demo%20Trained%20on%20one%20GPU%20with%2024G%20memory

Oct 19 '23 03:10 kin0303

It means that AR model could not predict EOS token which implies that it was not trained well. Do you know if this happens with other examples? Btw, does the loss curve of AR training seem ok?

Nov 03 '23 08:11 zero-or-one

It's the same with my problem. When I tested with a short prompt audio (3s or 4s), it was still good. However, the model didn't work or have a bad result. Could you guys help me to fix it?

Dec 27 '23 03:12 thelinhbkhn2014