Amphion icon indicating copy to clipboard operation
Amphion copied to clipboard

[Help]: MaskGCT's results were very strange

Open WhiteNightMo opened this issue 3 months ago • 9 comments

Problem Overview

I modified this file models/tts/maskgct/maskgct_inference.py, changes are as follows:

    # inference
    prompt_wav_path = "./models/tts/maskgct/wav/5s.wav"
    save_path = "generated_audio7.wav"
    prompt_text = "想要交友吗?快来SOUL啊"
    target_text = "新用户真的可以享年化利率最低3.6%的优惠"
    # Specify the target duration (in seconds). If target_len = None, we use a simple rule to predict the target duration.
    target_len = None
    maskgct_inference_pipeline = MaskGCT_Inference_Pipeline(
        semantic_model,
        semantic_codec,
        codec_encoder,
        codec_decoder,
        t2s_model,
        s2a_model_1layer,
        s2a_model_full,
        semantic_mean,
        semantic_std,
        device,
    )

    recovered_audio = maskgct_inference_pipeline.maskgct_inference(
        prompt_wav_path, prompt_text, target_text, "zh", "zh", target_len=target_len
    )

    sf.write(save_path, recovered_audio, 24000)

Run command:

python -m models.tts.maskgct.maskgct_inference

The output did not meet my expectations.

My original file: 5s.zip

Output file: generated_audio7.zip

WhiteNightMo avatar Nov 21 '24 03:11 WhiteNightMo