Retrieval-based-Voice-Conversion-WebUI icon indicating copy to clipboard operation
Retrieval-based-Voice-Conversion-WebUI copied to clipboard

Error running gradio infer_convert API

Open Conghdos opened this issue 1 year ago • 8 comments

When I call infer_convert from the API I get the following error. UI/.venv/lib/python3.8/site-packages/gradio/components.py", line 2752, in process_single_file f["name"], TypeError: string indices must be integers

Conghdos avatar Jan 17 '24 05:01 Conghdos

Hello, could you please show me your code, where do you call the api from? I might be able to help you!

rekvizitt avatar Jan 17 '24 09:01 rekvizitt

I have the same problem. Here is my code.

import { client } from "@gradio/client";
async function run() {
	const response_0 = await fetch("https://github.com/gradio-app/gradio/raw/main/test/test_files/sample_file.pdf");
	const exampleFile = await response_0.blob();
						
	const app = await client("http://xxx.xxx.xxx.xxx:");
	const result = await app.predict("/uvr_convert", [		
				"onnx_dereverb_By_FoxJoy", // string (Option from: ['onnx_dereverb_By_FoxJoy', 'VR-DeEchoDeReverb', 'VR-DeEchoNormal', 'HP5_only_main_vocal', 'HP2_all_vocals', 'VR-DeEchoAggressive', 'HP3_all_vocals']) in 'Model' Dropdown component		
				"Howdy!", // string  in 'Enter the path of the audio folder to be processed:' Textbox component		
				"Howdy!", // string  in 'Specify the output folder for vocals:' Textbox component
				exampleFile, 	// blob in 'Multiple audio files can also be imported. If a folder path exists, this input is ignored.' File component		
				"Howdy!", // string  in 'Specify the output folder for accompaniment:' Textbox component		
				0, // number (numeric value between 0 and 20) in '人声提取激进程度' Slider component		
				"wav", // string  in 'Export file format' Radio component
	]);

	console.log(result?.data);
}
run();

bawbeans avatar Jan 17 '24 09:01 bawbeans

I have the same problem. Here is my code.

import { client } from "@gradio/client";
async function run() {
	const response_0 = await fetch("https://github.com/gradio-app/gradio/raw/main/test/test_files/sample_file.pdf");
	const exampleFile = await response_0.blob();
						
	const app = await client("http://xxx.xxx.xxx.xxx:");
	const result = await app.predict("/uvr_convert", [		
				"onnx_dereverb_By_FoxJoy", // string (Option from: ['onnx_dereverb_By_FoxJoy', 'VR-DeEchoDeReverb', 'VR-DeEchoNormal', 'HP5_only_main_vocal', 'HP2_all_vocals', 'VR-DeEchoAggressive', 'HP3_all_vocals']) in 'Model' Dropdown component		
				"Howdy!", // string  in 'Enter the path of the audio folder to be processed:' Textbox component		
				"Howdy!", // string  in 'Specify the output folder for vocals:' Textbox component
				exampleFile, 	// blob in 'Multiple audio files can also be imported. If a folder path exists, this input is ignored.' File component		
				"Howdy!", // string  in 'Specify the output folder for accompaniment:' Textbox component		
				0, // number (numeric value between 0 and 20) in '人声提取激进程度' Slider component		
				"wav", // string  in 'Export file format' Radio component
	]);

	console.log(result?.data);
}
run();

is that how you run it? Read carefully the comments to each field and understand what should be entered there. It doesn't have to be "Howdy!"

Here is my working code on infer_convert for an example (python):

client = Client(f"{client_link}")
result = client.predict(
    0,	# float (numeric value between 0 and 2333) in 'Select Speaker/Singer ID:' Slider component
    f"{vocal_path}/vocal_audio.wav_10.mp3",	# str  in 'Enter the path of the audio file to be processed (default is the correct format example):' Textbox component
    2,	# float  in 'Transpose (integer, number of semitones, raise by an octave: 12, lower by an octave: -12):' Number component
    "D:\\test_audio\\test.mp3", # str (filepath on your computer (or URL) of file) in 'F0 curve file (optional). One pitch per line. Replaces the default F0 and pitch modulation:' File component
    "crepe",	# str  in 'Select the pitch extraction algorithm ('pm': faster extraction but lower-quality speech; 'harvest': better bass but extremely slow; 'crepe': better quality but GPU intensive), 'rmvpe': best quality, and little GPU requirement' Radio component
    "",	# str  in 'Path to the feature index file. Leave blank to use the selected result from the dropdown:' Textbox component
    f"logs/{model_name}.index",	# str (Option from: ['logs/bratishkin.index', 'logs/evelon.index', 'logs/jesusavgn.index', 'logs/kussia.index', 'logs/mazellov.index', 'logs/zolo.index', 'logs/zubarev.index']) in 'Auto-detect index path and select from the dropdown:' Dropdown component
    0.75,	# float (numeric value between 0 and 1) in 'Search feature ratio (controls accent strength, too high has artifacting):' Slider component
    3,	# float (numeric value between 0 and 7) in 'If >=3: apply median filtering to the harvested pitch results. The value represents the filter radius and can reduce breathiness.' Slider component
    0,	# float (numeric value between 0 and 48000) in 'Resample the output audio in post-processing to the final sample rate. Set to 0 for no resampling:' Slider component
    0.25,	# float (numeric value between 0 and 1) in 'Adjust the volume envelope scaling. Closer to 0, the more it mimicks the volume of the original vocals. Can help mask noise and make volume sound more natural when set relatively low. Closer to 1 will be more of a consistently loud volume:' Slider component
    0.33,	# float (numeric value between 0 and 0.5) in 'Protect voiceless consonants and breath sounds to prevent artifacts such as tearing in electronic music. Set to 0.5 to disable. Decrease the value to increase protection, but it may reduce indexing accuracy:' Slider component
    api_name="/infer_convert"
)
print(result)

Note that I use Google Colab to running infer-web.py, and this code with infer_convert I run on my PC! I also use the variables client_link, vocal_path and model_name (you may have it differently, for example, without these variables).

rekvizitt avatar Jan 17 '24 09:01 rekvizitt

Thank you for your feedback. File transfer address is there a way to transfer a network address?

bawbeans avatar Jan 17 '24 10:01 bawbeans

Thank you for your feedback. File transfer address is there a way to transfer a network address?

Sorry, unfortunately I don't know because I haven't used it

rekvizitt avatar Jan 17 '24 10:01 rekvizitt

@rekvizitt Here is my code

const result = await app.predict("/infer_convert", [
                    0, // number (numeric value between 0 and 2333) in 'Select Speaker/Singer ID:' Slider component		
                    filePath, // string  in 'Enter the path of the audio file to be processed (default is the correct format example):' Textbox component		
                    5, // number  in 'Transpose (integer, number of semitones, raise by an octave: 12, lower by an octave: -12):' Number component
                    "", 	// blob in 'F0 curve file (optional). One pitch per line. Replaces the default F0 and pitch modulation:' File component		
                    "rmvpe", // string  in 'Select the pitch extraction algorithm ('pm': faster extraction but lower-quality speech; 'harvest': better bass but extremely slow; 'crepe': better quality but GPU intensive), 'rmvpe': best quality, and little GPU requirement' Radio component		
                    ``, // string  in 'Path to the feature index file. Leave blank to use the selected result from the dropdown:' Textbox component		
                    `logs/${v}/${v}.index`, // string (Option from: ['logs/HNNgocHuyen/added_IVF606_Flat_nprobe_1_HNNgocHuyen_v2.index', 'logs/Jinni - Former NMIXX - Weights.gg Model/Jini/jinni.index', 'logs/LinhLan/model.index']) in 'Auto-detect index path and select from the dropdown:' Dropdown component		
                    0, // number (numeric value between 0 and 1) in 'Search feature ratio (controls accent strength, too high has artifacting):' Slider component		
                    0, // number (numeric value between 0 and 7) in 'If >=3: apply median filtering to the harvested pitch results. The value represents the filter radius and can reduce breathiness.' Slider component		
                    0, // number (numeric value between 0 and 48000) in 'Resample the output audio in post-processing to the final sample rate. Set to 0 for no resampling:' Slider component		
                    0, // number (numeric value between 0 and 1) in 'Adjust the volume envelope scaling. Closer to 0, the more it mimicks the volume of the original vocals. Can help mask noise and make volume sound more natural when set relatively low. Closer to 1 will be more of a consistently loud volume:' Slider component		
                    0, // number (numeric value between 0 and 0.5) in 'Protect voiceless consonants and breath sounds to prevent artifacts such as tearing in electronic music. Set to 0.5 to disable. Decrease the value to increase protection, but it may reduce indexing accuracy:' Slider component
                ]);
                console.log(result?.data);

Conghdos avatar Jan 17 '24 10:01 Conghdos

@rekvizitt

I tried using "" for the f0 file argument, but it throws an error. I don't want to use this file at all as I'm doing simple speech-to-speech processing. I tried null, None and "", and nothing has worked. Is it effectively requierd?

def process_with_rvc(self, input_path, output_path):
    client = Client("http://localhost:7865/")

    result = client.predict(
        0,  # float (numeric value between 0 and 2333) in 'Select Speaker/Singer ID:' Slider component
        input_path,  # str in 'Enter the path of the audio file to be processed (default is the correct format example):' Textbox component
        0,  # float in 'Transpose (integer, number of semitones, raise by an octave: 12, lower by an octave: -12):' Number component
        "",  # str (filepath on your computer (or URL) of file) in 'F0 curve file (optional). One pitch per line. Replaces the default F0 and pitch modulation:' File component
        "rmvpe",  # str in 'Select the pitch extraction algorithm ('pm': faster extraction but lower-quality speech; 'harvest': better bass but extremely slow; 'crepe': better quality but GPU intensive), 'rmvpe': best quality, and little GPU requirement' Radio component
        self.rvc_index_path.get(),  # str in 'Path to the feature index file. Leave blank to use the selected result from the dropdown:' Textbox component
        "",  # str (Option from: []) in 'Auto-detect index path and select from the dropdown:' Dropdown component
        0,  # float (numeric value between 0 and 1) in 'Search feature ratio (controls accent strength, too high has artifacting):' Slider component
        0,  # float (numeric value between 0 and 7) in 'If >=3: apply median filtering to the harvested pitch results. The value represents the filter radius and can reduce breathiness.' Slider component
        0,  # float (numeric value between 0 and 48000) in 'Resample the output audio in post-processing to the final sample rate. Set to 0 for no resampling:' Slider component
        0,  # float (numeric value between 0 and 1) in 'Adjust the volume envelope scaling. Closer to 0, the more it mimicks the volume of the original vocals. Can help mask noise and make volume sound more natural when set relatively low. Closer to 1 will be more of a consistently loud volume:' Slider component
        0,  # float (numeric value between 0 and 0.5) in 'Protect voiceless consonants and breath sounds to prevent artifacts such as tearing in electronic music. Set to 0.5 to disable. Decrease the value to increase protection, but it may reduce indexing accuracy:' Slider component
        api_name="/infer_convert"
    )

    print(result)

    with open(output_path, "wb") as f:
        f.write(result)

lukaszliniewicz avatar Mar 09 '24 18:03 lukaszliniewicz

import requests import base64 from pathlib import Path

音频文件路径

audio_file_path = 'C:/Users/Administrator/Desktop/2123m.wav'

确保音频文件存在

if Path(audio_file_path).is_file(): # 打开音频文件并读取其内容 with open(audio_file_path, 'rb') as audio_file: audio_data = audio_file.read()

# 将音频数据转换为base64编码
audio_base64 = base64.b64encode(audio_data).decode('utf-8')

# 构建请求数据,包含音频的base64编码
payload = {
    "data": [
        "HP2_all_vocals",  # 模型选择,这里需要替换为实际的模型名称
        "",  # 输入待处理音频文件夹路径,替换为实际的文件夹路径
        "opt",  # 指定输出主人声文件夹,这里需要替换为实际的文件夹路径
        {"name": "2123m.wav", "data": f"data:audio/wav;base64,{audio_base64}"},  # 音频文件名和base64数据
        "opt",  # 指定输出非主人声文件夹,这里需要替换为实际的文件夹路径
        0,  # 人声提取激进程度
        "wav"  # 导出文件格式
    ]
}

# 发送POST请求
response = requests.post("http://localhost:7897/run/uvr_convert", json=payload)

# 检查响应状态
if response.status_code == 200:
    # 解析响应数据
    data = response.json()['data']

    # 打印结果
    print("处理成功,输出信息:", data)
else:
    print("请求失败,状态码:", response.status_code)

else: print("音频文件路径不正确或文件不存在。")

E:\RVC1006Nvidia>test.py

处理成功,输出信息: [None]

2024-03-29 23:39:47 | INFO | infer.modules.uvr5.modules | Executed torch.cuda.empty_cache()

jacksinofn avatar Mar 29 '24 15:03 jacksinofn