whatsapp-chatgpt
whatsapp-chatgpt copied to clipboard
[Bug] TTS failed on Chinese language
I ran the TTS function test, however, it firstly throws a warning:
/.local/lib/python3.10/site-packages/whisper/transcribe.py:78: UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
and then it crashes during the process of voice generation:
[2023-02-27 21:12:03,865] ERROR in app: Exception on /tts [POST]
Traceback (most recent call last):
File "/.local/lib/python3.10/site-packages/flask/app.py", line 2528, in wsgi_app
response = self.full_dispatch_request()
File "/.local/lib/python3.10/site-packages/flask/app.py", line 1825, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/.local/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request
rv = self.dispatch_request()
File "/.local/lib/python3.10/site-packages/flask/app.py", line 1799, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "/speech-rest-api/app.py", line 90, in generate_tts
tmp_path_wav = run_tts_and_save_file(sentence)
File "/speech-rest-api/app.py", line 44, in run_tts_and_save_file
mel_outputs, mel_length, alignment = tacotron2.encode_batch([text])
File "/.local/lib/python3.10/site-packages/speechbrain/pretrained/interfaces.py", line 2663, in encode_batch
mel_outputs_postnet, mel_lengths, alignments = self.infer(
File "/.local/lib/python3.10/site-packages/speechbrain/lobes/models/Tacotron2.py", line 1511, in infer
embedded_inputs = self.embedding(inputs).transpose(1, 2)
File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/.local/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 160, in forward
return F.embedding(
File "//.local/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)
127.0.0.1 - - [27/Feb/2023 21:12:03] "POST /tts HTTP/1.1" 500 -
I believe it is a problem of forcing to convert the data format. However, at this time I don't have time to dig into.
I ran the TTS function test, however, it firstly throws a warning:
/.local/lib/python3.10/site-packages/whisper/transcribe.py:78: UserWarning: FP16 is not supported on CPU; using FP32 instead warnings.warn("FP16 is not supported on CPU; using FP32 instead")
and then it crashes during the process of voice generation:
[2023-02-27 21:12:03,865] ERROR in app: Exception on /tts [POST] Traceback (most recent call last): File "/.local/lib/python3.10/site-packages/flask/app.py", line 2528, in wsgi_app response = self.full_dispatch_request() File "/.local/lib/python3.10/site-packages/flask/app.py", line 1825, in full_dispatch_request rv = self.handle_user_exception(e) File "/.local/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request rv = self.dispatch_request() File "/.local/lib/python3.10/site-packages/flask/app.py", line 1799, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) File "/speech-rest-api/app.py", line 90, in generate_tts tmp_path_wav = run_tts_and_save_file(sentence) File "/speech-rest-api/app.py", line 44, in run_tts_and_save_file mel_outputs, mel_length, alignment = tacotron2.encode_batch([text]) File "/.local/lib/python3.10/site-packages/speechbrain/pretrained/interfaces.py", line 2663, in encode_batch mel_outputs_postnet, mel_lengths, alignments = self.infer( File "/.local/lib/python3.10/site-packages/speechbrain/lobes/models/Tacotron2.py", line 1511, in infer embedded_inputs = self.embedding(inputs).transpose(1, 2) File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/.local/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 160, in forward return F.embedding( File "//.local/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding) 127.0.0.1 - - [27/Feb/2023 21:12:03] "POST /tts HTTP/1.1" 500 -
I believe it is a problem of forcing to convert the data format. However, at this time I don't have time to dig into.
I have run a couple of tests. Now it turns out to be the problem of the Chinese language. The error message persists:
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)
Can you give me the prompt which crashes the bot, so I can reproduce this issue and fix it?
the tts endpoint only supports english at the moment.
Can you give me the prompt which crashes the bot, so I can reproduce this issue and fix it?
Just random Chinese sentences. Like "语音测试" (Voice test).
Can you give me the prompt which crashes the bot, so I can reproduce this issue and fix it?
Just random Chinese sentences. Like "语音测试" (Voice test).
Okay... I have noticed it also doesn't work in the Korean/Japanese language #104. Anyway thanks for your precious work!
Thank you, I will try it out and try to fix it. At least to not crash the bot.