whisper-vits-svc USP推理初次測試

【岩崎良美《涼風》Cover by 岩崎宏美｜Sovits5.0 Bigvgan-mix-v2 USP 推理-哔哩哔哩】 https://b23.tv/mKdq4qL

效果很好，一次就能直出無怪音。對比前一版本改善許多

Jul 12 '23 18:07 Taiwan1912

请问这个USP有具体的代码实现吗？

Sep 11 '23 04:09 zhyoung24

現在的版本5.0應該都是自帶USP推理了

Sep 11 '23 04:09 Taiwan1912

具体是在哪部分实现的呢，我看推理代码也就是读取pitch，然后传入inference，没有看到其他操作了

Sep 11 '23 04:09 zhyoung24

這個要問作者了,說明文檔沒有具體說明這點,只有提到本項目採用USP推理

Sep 11 '23 04:09 Taiwan1912

具体是在哪部分实现的呢，我看推理代码也就是读取pitch，然后传入inference，没有看到其他操作了

这这样的，原来crepe出来的pitch，需要经过UV去掉picth；现在测试，不经过UV去掉pitch好些。

原来的

def compute_f0_sing(filename, device):
    audio, sr = librosa.load(filename, sr=16000)
    assert sr == 16000
    audio = torch.tensor(np.copy(audio))[None]
    # Here we'll use a 20 millisecond hop length
    hop_length = 320
    fmin = 50
    fmax = 1000
    model = "full"
    batch_size = 512
    pitch, periodicity = torchcrepe.predict(
        audio,
        sr,
        hop_length,
        fmin,
        fmax,
        model,
        batch_size=batch_size,
        device=device,
        return_periodicity=True,
    )
    pitch = np.repeat(pitch, 2, -1)  # 320 -> 160 * 2
    periodicity = np.repeat(periodicity, 2, -1)  # 320 -> 160 * 2
    # CREPE was not trained on silent audio. some error on silent need filter.
    periodicity = torchcrepe.filter.median(periodicity, 9)
    pitch = torchcrepe.filter.mean(pitch, 9)
    pitch[periodicity < 0.1] = 0
    pitch = pitch.squeeze(0)
    return pitch

现在的USP方式

def compute_f0_sing(filename, device):
    audio, sr = librosa.load(filename, sr=16000)
    assert sr == 16000
    audio = torch.tensor(np.copy(audio))[None]
    audio = audio + torch.randn_like(audio) * 0.001
    # Here we'll use a 20 millisecond hop length
    hop_length = 320
    fmin = 50
    fmax = 1000
    model = "full"
    batch_size = 512
    pitch = crepe.predict(
        audio,
        sr,
        hop_length,
        fmin,
        fmax,
        model,
        batch_size=batch_size,
        device=device,
        return_periodicity=False,
    )
    pitch = np.repeat(pitch, 2, -1)  # 320 -> 160 * 2
    pitch = crepe.filter.mean(pitch, 5)
    pitch = pitch.squeeze(0)
    return pitch

就是把这个删除了：pitch[periodicity < 0.1] = 0

Sep 11 '23 05:09 MaxMax2016

哦哦，多谢。我看preprocess的代码里还是保留了这个pitch[periodicity < 0.1] = 0，所以是训练的时候还是置0，只有推理时用USP？这样不匹配的方式反而更好？

Sep 11 '23 05:09 zhyoung24

测试下来是这样，预处理用USP会导致推理的时候不可控，比如长音从中间断开

Sep 11 '23 05:09 MaxMax2016

您好，这个问题中，我的音频pitch经过usp以后，再推理出来会给呼吸声加上音高成为听起来类似哮喘的声音，请问您遇到这个问题了吗，如何解决的呢？

Sep 02 '24 11:09 panxin801

whisper-vits-svc whisper-vits-svc copied to clipboard

USP推理初次測試

whisper-vits-svc
whisper-vits-svc copied to clipboard