whisper-vits-svc
whisper-vits-svc copied to clipboard
USP推理初次測試
【岩崎良美《涼風》Cover by 岩崎宏美 |Sovits5.0 Bigvgan-mix-v2 USP 推理-哔哩哔哩】 https://b23.tv/mKdq4qL
效果很好,一次就能直出無怪音。 對比前一版本改善許多
请问这个USP有具体的代码实现吗?
現在的版本5.0應該都是自帶USP推理了
具体是在哪部分实现的呢,我看推理代码也就是读取pitch,然后传入inference,没有看到其他操作了
這個要問作者了,說明文檔沒有具體說明這點,只有提到本項目採用USP推理
具体是在哪部分实现的呢,我看推理代码也就是读取pitch,然后传入inference,没有看到其他操作了
这这样的,原来crepe出来的pitch,需要经过UV去掉picth;现在测试,不经过UV去掉pitch好些。
原来的
def compute_f0_sing(filename, device):
audio, sr = librosa.load(filename, sr=16000)
assert sr == 16000
audio = torch.tensor(np.copy(audio))[None]
# Here we'll use a 20 millisecond hop length
hop_length = 320
fmin = 50
fmax = 1000
model = "full"
batch_size = 512
pitch, periodicity = torchcrepe.predict(
audio,
sr,
hop_length,
fmin,
fmax,
model,
batch_size=batch_size,
device=device,
return_periodicity=True,
)
pitch = np.repeat(pitch, 2, -1) # 320 -> 160 * 2
periodicity = np.repeat(periodicity, 2, -1) # 320 -> 160 * 2
# CREPE was not trained on silent audio. some error on silent need filter.
periodicity = torchcrepe.filter.median(periodicity, 9)
pitch = torchcrepe.filter.mean(pitch, 9)
pitch[periodicity < 0.1] = 0
pitch = pitch.squeeze(0)
return pitch
现在的USP方式
def compute_f0_sing(filename, device):
audio, sr = librosa.load(filename, sr=16000)
assert sr == 16000
audio = torch.tensor(np.copy(audio))[None]
audio = audio + torch.randn_like(audio) * 0.001
# Here we'll use a 20 millisecond hop length
hop_length = 320
fmin = 50
fmax = 1000
model = "full"
batch_size = 512
pitch = crepe.predict(
audio,
sr,
hop_length,
fmin,
fmax,
model,
batch_size=batch_size,
device=device,
return_periodicity=False,
)
pitch = np.repeat(pitch, 2, -1) # 320 -> 160 * 2
pitch = crepe.filter.mean(pitch, 5)
pitch = pitch.squeeze(0)
return pitch
就是把这个删除了:pitch[periodicity < 0.1] = 0
哦哦,多谢。我看preprocess的代码里还是保留了这个pitch[periodicity < 0.1] = 0,所以是训练的时候还是置0,只有推理时用USP?这样不匹配的方式反而更好?
测试下来是这样,预处理用USP会导致推理的时候不可控,比如长音从中间断开
您好,这个问题中,我的音频pitch经过usp以后,再推理出来会给呼吸声加上音高成为听起来类似哮喘的声音,请问您遇到这个问题了吗,如何解决的呢?