Wav2Lip icon indicating copy to clipboard operation
Wav2Lip copied to clipboard

audio time != mel time

Open mathpopo opened this issue 1 year ago • 1 comments

i use 16k mono ,6.7025s wav , mel 167 frame->167 *40 ms=6.68s the same ,i use 6s wav , mel 147 5.58s why? issue:audio also play , the mel have no

mathpopo avatar Nov 22 '23 08:11 mathpopo

   # inference.py:328
    mel = audio.melspectrogram(wav)
    print(mel.shape)

    if np.isnan(mel.reshape(-1)).sum() > 0:
        raise ValueError(
            "Mel contains nan! Using a TTS voice? Add a small epsilon noise to the wav file and try again"
        )

    mel_chunks = []
    mel_idx_multiplier = 80.0 / fps
    i = 0
    while 1:
        start_idx = int(i * mel_idx_multiplier)
        if start_idx + mel_step_size > len(mel[0]):
            mel_chunks.append(mel[:, len(mel[0]) - mel_step_size :])
            break
        mel_chunks.append(mel[:, start_idx : start_idx + mel_step_size])
        i += 1

melspectrogram会进行填充,mel_chunks的生成逻辑中会忽略末尾的一些mel窗口,这些造成了时间不一致。 我估算的差异应该在15/80-1/25=0.1475s以内。不太明白为什么你的差异这么大。 你可以debug上面代码分析。

xuxianren avatar Jan 18 '24 09:01 xuxianren