[Feature]: can MaskGCT process Chinese zero-shot TTS?
when trying the inference for Chinese TTS, it will turn out the following error:
RuntimeError: The size of tensor a (1649) must match the size of tensor b (1758) at non-singleton dimension 3
I have chosen the language “zh”. so could you let me know:
- does the current MaskGCT support Chinese?
- or what did I do wrong? how can I handle it??
thank you very much!
Hi, the current MaskGCT supports Chinese (in fact, we support six languages: en, zh, fr, de, kr, ja), can you give me more details about the error, for example, a screenshot.
Hi, the current MaskGCT supports Chinese (in fact, we support six languages: en, zh, fr, de, kr, ja), can you give me more details about the error, for example, a screenshot.
like this:
Traceback (most recent call last):
File "/try/Amphion/test.py", line 120, in <module>
recovered_audio = maskgct_inference_pipeline.maskgct_inference(
File "/try/Amphion/models/tts/maskgct/maskgct_utils.py", line 261, in maskgct_inference
combine_semantic_code, _ = self.text2semantic(
File "/root/miniforge3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/try/Amphion/models/tts/maskgct/maskgct_utils.py", line 175, in text2semantic
predict_semantic = self.t2s_model.reverse_diffusion(
File "/root/miniforge3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/try/Amphion/models/tts/maskgct/maskgct_t2s.py", line 292, in reverse_diffusion
mask_embeds = self.diff_estimator(
File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/try/Amphion/models/tts/maskgct/llama_nar.py", line 621, in forward
layer_outputs = decoder_layer(
File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/try/Amphion/models/tts/maskgct/llama_nar.py", line 173, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniforge3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 378, in forward
attn_weights = attn_weights + causal_mask
RuntimeError: The size of tensor a (1008) must match the size of tensor b (1019) at non-singleton dimension 3
mostly in this case when using "zh" in language or target_language. sometimes it will disappear when the target_text set more shorter. does the target text length has a setting or preference in this work? thanks for your time!