Amphion
Amphion copied to clipboard
[Help]: MaskGCT's results were very strange
Problem Overview
I modified this file models/tts/maskgct/maskgct_inference.py
, changes are as follows:
# inference
prompt_wav_path = "./models/tts/maskgct/wav/5s.wav"
save_path = "generated_audio7.wav"
prompt_text = "想要交友吗?快来SOUL啊"
target_text = "新用户真的可以享年化利率最低3.6%的优惠"
# Specify the target duration (in seconds). If target_len = None, we use a simple rule to predict the target duration.
target_len = None
maskgct_inference_pipeline = MaskGCT_Inference_Pipeline(
semantic_model,
semantic_codec,
codec_encoder,
codec_decoder,
t2s_model,
s2a_model_1layer,
s2a_model_full,
semantic_mean,
semantic_std,
device,
)
recovered_audio = maskgct_inference_pipeline.maskgct_inference(
prompt_wav_path, prompt_text, target_text, "zh", "zh", target_len=target_len
)
sf.write(save_path, recovered_audio, 24000)
Run command:
python -m models.tts.maskgct.maskgct_inference
The output did not meet my expectations.
My original file: 5s.zip
Output file: generated_audio7.zip