Retrieval-based-Voice-Conversion-WebUI Audio buffer is not finite everywhere

进行人声分离操作时，windows 64位系统，报错 Traceback (most recent call last): File "D:\voiceprint_migration\RVC-beta\infer-web.py", line 294, in uvr pre_fun.path_audio(inp_path, save_root_ins, save_root_vocal) File "D:\voiceprint_migration\RVC-beta\infer_uvr5.py", line 135, in path_audio wav_instrument = spec_utils.cmb_spectrogram_to_wave( File "D:\voiceprint_migration\RVC-beta\uvr5_pack\lib_v5\spec_utils.py", line 392, in cmb_spectrogram_to_wave wave = librosa.resample( File "D:\voiceprint_migration\RVC-beta\runtime\lib\site-packages\librosa\util\decorators.py", line 104, in inner_f return f(**kwargs) File "D:\voiceprint_migration\RVC-beta\runtime\lib\site-packages\librosa\core\audio.py", line 606, in resample util.valid_audio(y, mono=False) File "D:\voiceprint_migration\RVC-beta\runtime\lib\site-packages\librosa\util\decorators.py", line 88, in inner_f return f(*args, **kwargs) File "D:\voiceprint_migration\RVC-beta\runtime\lib\site-packages\librosa\util\utils.py", line 294, in valid_audio raise ParameterError("Audio buffer is not finite everywhere") librosa.util.exceptions.ParameterError: Audio buffer is not finite everywhere 求助

May 26 '23 02:05 xiaoshuyimei

检查下你的输入的音频的路径，这个音频是不是非常短，文件很小

May 29 '23 09:05 RVC-Boss

音频文件我切成的都是5秒一段的，这个会是影响点吗

May 31 '23 00:05 xiaoshuyimei

Hi, I am facing the same issue. Windows 11. This is in the separation & reverberation removal process. The input file name is 'toSplit.wav'. This error is consistent, tried for various different wav files (files downloaded, generated, converted from mp3, same error for all)

Is it related to the GPU or system config? Can you please help me with this? Thanks in advance. Please let me know if you need any further info.

toSplit.wav.reformatted.wav->Traceback (most recent call last): File "E:\RVC\infer-web.py", line 370, in uvr pre_fun.path_audio( File "E:\RVC\infer_uvr5.py", line 120, in path_audio wav_instrument = spec_utils.cmb_spectrogram_to_wave( File "E:\RVC\uvr5_pack\lib_v5\spec_utils.py", line 392, in cmb_spectrogram_to_wave wave = librosa.resample( File "E:\RVC\runtime\lib\site-packages\librosa\util\decorators.py", line 104, in inner_f return f(**kwargs) File "E:\RVC\runtime\lib\site-packages\librosa\core\audio.py", line 576, in resample util.valid_audio(y, mono=False) File "E:\RVC\runtime\lib\site-packages\librosa\util\decorators.py", line 88, in inner_f return f(*args, **kwargs) File "E:\RVC\runtime\lib\site-packages\librosa\util\utils.py", line 294, in valid_audio raise ParameterError("Audio buffer is not finite everywhere") librosa.util.exceptions.ParameterError: Audio buffer is not finite everywhere

Jul 08 '23 08:07 creativekoalaroo

Facing the same issue here too

Jul 26 '23 13:07 IndraTensei

Hello, I am facing the same issue aswell

Aug 11 '23 13:08 ArthurG29

same here too

Oct 14 '23 09:10 HIX4123

I have encountered this problem as well. My GPU is a 1660ti. It has poor compatibility with half precision optimization. . Could this be related to the issue?

Jan 19 '24 17:01 hhucchenyixiao

Still same issue EDIT: This was happening because I was trying to use my own script. RVC has 3 levels of try-except that re-try the _path_audio_ method again, and now I understand why :upside_down_face: This error does not happen to me when using WebUI

Apr 18 '24 20:04 Zhincore

I explored the cause. When using the WebUI to perform operations step by step, this phenomenon rarely occurs. This may be because each time the model is re-entered, librosa is used to re-read the audio and remove possible non-compliant parts. I use scripts to perform various operations myself. In most cases, this phenomenon does not occur. There are several possible reasons: (1) The audio length is too short. (2) There is complete silence in some long parts of the audio. (3) Your graphics card may not support half-precision training. In the above cases, before spec_utils.cmb_spectrogram_to_wave() in ./infer/modules/uvr5/vr.py, tensors such as v_spec_m will become nan. Before becoming nan, the real number domain of the complex signal of the tensor is 0. In addition, this phenomenon often occurs in DeReverb. When this happens, commenting DeReverb can run normally.

Based on the above cases, I think that directly setting the nan value to 0 will not affect the final output. What I did was to first change nan to 0 in the cmb_spectrogram_to_wave() function in line 349 of ./infer/lib/uvr5_pack/lib_v5/spec_utils.py. So far, I have not gotten any abnormal output.

By the way, this situation may be caused by the use of librosa.core.load() in line 63 of ./infer/modules/uvr5/vr.py instead of ffmpeg. The author also gave a comment that there may be a bug there. Additionally, it is best to concatenate your training data into longer (10+ minutes) audio clips.

我探究了一下成因。在使用WebUI分步操作时，这种现象几乎不会出现。这可能是因为每一次重新进入模型时，都会使用librosa重新读取音频，去除了可能不合规范的地方。而我自己使用脚本执行各个操作。在大部分情况下，这个现象都不会出现。主要有以下几种可能出现的原因： (1) 音频长度过短。 (2) 音频在某些较长的地方存在完全的静音。 (3) 你的显卡可能不支持半精度训练。在以上情况下，在./infer/modules/uvr5/vr.py 的spec_utils.cmb_spectrogram_to_wave() 前，v_spec_m等张量都会变成nan。而在变成nan之前，张量的复数信号的实数域都是0. 此外，这种现象经常在DeReverb中出现。当出现这种情况时，注释DeReverb后能够正常运行。

由以上几种情况，我认为直接将nan值置0并不会影响最终的输出。我的做法是在./infer/lib/uvr5_pack/lib_v5/spec_utils.py 349行的cmb_spectrogram_to_wave()函数中先将nan变为0。到目前为止，我并没有得到任何异常输出。

多提一句，这种情况可能是 ./infer/modules/uvr5/vr.py 中 63 行使用了librosa.core.load()，而没有使用ffmpeg导致的，作者本人也在那里给出了可能存在bug的注释。此外，最好将训练数据拼接成较长（10分钟以上）的音频。

def cmb_spectrogram_to_wave(spec_m, mp, extra_bins_h=None, extra_bins=None):
    # 提前将nan变为0
    spec_m = np.where(np.isnan(spec_m), 0, spec_m)

    if extra_bins_h is not None:
        extra_bins_h = np.where(np.isnan(extra_bins_h), 0, extra_bins_h)
    if extra_bins is not None:
        extra_bins = np.where(np.isnan(extra_bins), 0, extra_bins)

    wave_band = {}
    bands_n = len(mp.param["band"])
    offset = 0

    for d in range(1, bands_n + 1):
        bp = mp.param["band"][d]
        # spec_s = np.ndarray(
        #     shape=(2, bp["n_fft"] // 2 + 1, spec_m.shape[2]), dtype=complex
        # )
        spec_s = np.zeros(
            shape=(2, bp["n_fft"] // 2 + 1, spec_m.shape[2]), dtype=complex
        )
        h = bp["crop_stop"] - bp["crop_start"]
        spec_s[:, bp["crop_start"] : bp["crop_stop"], :] = spec_m[
            :, offset : offset + h, :
        ]

        offset += h
        if d == bands_n:  # higher
            if extra_bins_h:  # if --high_end_process bypass
                max_bin = bp["n_fft"] // 2
                spec_s[:, max_bin - extra_bins_h : max_bin, :] = extra_bins[
                    :, :extra_bins_h, :
                ]
            if bp["hpf_start"] > 0:
                spec_s = fft_hp_filter(spec_s, bp["hpf_start"], bp["hpf_stop"] - 1)
            if bands_n == 1:
                wave = spectrogram_to_wave(
                    spec_s,
                    bp["hl"],
                    mp.param["mid_side"],
                    mp.param["mid_side_b2"],
                    mp.param["reverse"],
                )
            else:
                wave = np.add(
                    wave,
                    spectrogram_to_wave(
                        spec_s,
                        bp["hl"],
                        mp.param["mid_side"],
                        mp.param["mid_side_b2"],
                        mp.param["reverse"],
                    ),
                )
        else:
            sr = mp.param["band"][d + 1]["sr"]
            if d == 1:  # lower
                spec_s = fft_lp_filter(spec_s, bp["lpf_start"], bp["lpf_stop"])
                wave = librosa.resample(
                    spectrogram_to_wave(
                        spec_s,
                        bp["hl"],
                        mp.param["mid_side"],
                        mp.param["mid_side_b2"],
                        mp.param["reverse"],
                    ),
                    bp["sr"],
                    sr,
                    res_type="sinc_fastest",
                )
            else:  # mid
                spec_s = fft_hp_filter(spec_s, bp["hpf_start"], bp["hpf_stop"] - 1)
                spec_s = fft_lp_filter(spec_s, bp["lpf_start"], bp["lpf_stop"])
                wave2 = np.add(
                    wave,
                    spectrogram_to_wave(
                        spec_s,
                        bp["hl"],
                        mp.param["mid_side"],
                        mp.param["mid_side_b2"],
                        mp.param["reverse"],
                    ),
                )
                # wave = librosa.core.resample(wave2, bp['sr'], sr, res_type="sinc_fastest")
                wave = librosa.core.resample(wave2, bp["sr"], sr, res_type="scipy")
                # wave = librosa.core.resample(wave2, bp["sr"], sr, res_type="kaiser_best")

    return wave.T

Jul 28 '24 14:07 HaRry-qaq

@xiaoshuyimei @ArthurG29 @creativekoalaroo @Zhincore @IndraTensei

Jul 28 '24 15:07 HaRry-qaq