whatsapp-chatgpt [Bug] TTS failed on Chinese language

I ran the TTS function test, however, it firstly throws a warning:

/.local/lib/python3.10/site-packages/whisper/transcribe.py:78: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")

and then it crashes during the process of voice generation:

[2023-02-27 21:12:03,865] ERROR in app: Exception on /tts [POST]
Traceback (most recent call last):
  File "/.local/lib/python3.10/site-packages/flask/app.py", line 2528, in wsgi_app
    response = self.full_dispatch_request()
  File "/.local/lib/python3.10/site-packages/flask/app.py", line 1825, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/.local/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
  File "/.local/lib/python3.10/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/speech-rest-api/app.py", line 90, in generate_tts
    tmp_path_wav = run_tts_and_save_file(sentence)
  File "/speech-rest-api/app.py", line 44, in run_tts_and_save_file
    mel_outputs, mel_length, alignment = tacotron2.encode_batch([text])
  File "/.local/lib/python3.10/site-packages/speechbrain/pretrained/interfaces.py", line 2663, in encode_batch
    mel_outputs_postnet, mel_lengths, alignments = self.infer(
  File "/.local/lib/python3.10/site-packages/speechbrain/lobes/models/Tacotron2.py", line 1511, in infer
    embedded_inputs = self.embedding(inputs).transpose(1, 2)
  File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/.local/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 160, in forward
    return F.embedding(
  File "//.local/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)
127.0.0.1 - - [27/Feb/2023 21:12:03] "POST /tts HTTP/1.1" 500 -

I believe it is a problem of forcing to convert the data format. However, at this time I don't have time to dig into.

Feb 27 '23 14:02 MrPeterJin

I ran the TTS function test, however, it firstly throws a warning:

/.local/lib/python3.10/site-packages/whisper/transcribe.py:78: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")

and then it crashes during the process of voice generation:

[2023-02-27 21:12:03,865] ERROR in app: Exception on /tts [POST]
Traceback (most recent call last):
  File "/.local/lib/python3.10/site-packages/flask/app.py", line 2528, in wsgi_app
    response = self.full_dispatch_request()
  File "/.local/lib/python3.10/site-packages/flask/app.py", line 1825, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/.local/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
  File "/.local/lib/python3.10/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/speech-rest-api/app.py", line 90, in generate_tts
    tmp_path_wav = run_tts_and_save_file(sentence)
  File "/speech-rest-api/app.py", line 44, in run_tts_and_save_file
    mel_outputs, mel_length, alignment = tacotron2.encode_batch([text])
  File "/.local/lib/python3.10/site-packages/speechbrain/pretrained/interfaces.py", line 2663, in encode_batch
    mel_outputs_postnet, mel_lengths, alignments = self.infer(
  File "/.local/lib/python3.10/site-packages/speechbrain/lobes/models/Tacotron2.py", line 1511, in infer
    embedded_inputs = self.embedding(inputs).transpose(1, 2)
  File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/.local/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 160, in forward
    return F.embedding(
  File "//.local/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)
127.0.0.1 - - [27/Feb/2023 21:12:03] "POST /tts HTTP/1.1" 500 -

I believe it is a problem of forcing to convert the data format. However, at this time I don't have time to dig into.

I have run a couple of tests. Now it turns out to be the problem of the Chinese language. The error message persists:

RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)

Feb 27 '23 15:02 MrPeterJin

Can you give me the prompt which crashes the bot, so I can reproduce this issue and fix it?

Feb 27 '23 15:02 navopw

the tts endpoint only supports english at the moment.

Feb 27 '23 19:02 connorv001

Can you give me the prompt which crashes the bot, so I can reproduce this issue and fix it?

Just random Chinese sentences. Like "语音测试" (Voice test).

Feb 28 '23 01:02 MrPeterJin

Can you give me the prompt which crashes the bot, so I can reproduce this issue and fix it?

Just random Chinese sentences. Like "语音测试" (Voice test).

Okay... I have noticed it also doesn't work in the Korean/Japanese language #104. Anyway thanks for your precious work!

Feb 28 '23 01:02 MrPeterJin

Thank you, I will try it out and try to fix it. At least to not crash the bot.

Feb 28 '23 09:02 navopw

whatsapp-chatgpt whatsapp-chatgpt copied to clipboard

[Bug] TTS failed on Chinese language

whatsapp-chatgpt
whatsapp-chatgpt copied to clipboard