Retrieval-based-Voice-Conversion-WebUI
Retrieval-based-Voice-Conversion-WebUI copied to clipboard
myinfer.py has not been updated
There are new arguments added to infer-web.py to the vc_single function that myinfer.py uses. Command line usage of RVC doesn't seem to work because of this mismatch in parameters.
I noticed this and tried working on this by manually plugging in the missing variables into the function. I received a "Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same" error.
I tried rewriting the function by referencing what was written in infer-web.py vc_single() I received the same error.
Traceback (most recent call last):
File "/home/gadget/Retrieval-based-Voice-Conversion-WebUI/myinfer.py", line 247, in <module>
wavfile.write(opt_path, tgt_sr, wav_opt)
File "/home/gadget/Retrieval-based-Voice-Conversion-WebUI/myinfer.py", line 202, in vc_single_new
hubert_model,
File "/home/gadget/Retrieval-based-Voice-Conversion-WebUI/vc_infer_pipeline.py", line 313, in pipeline
self.vc(
File "/home/gadget/Retrieval-based-Voice-Conversion-WebUI/vc_infer_pipeline.py", line 163, in vc
logits = model.extract_features(**inputs)
File "/home/gadget/.local/lib/python3.10/site-packages/fairseq/models/hubert/hubert.py", line 535, in extract_features
res = self.forward(
File "/home/gadget/.local/lib/python3.10/site-packages/fairseq/models/hubert/hubert.py", line 437, in forward
features = self.forward_features(source)
File "/home/gadget/.local/lib/python3.10/site-packages/fairseq/models/hubert/hubert.py", line 392, in forward_features
features = self.feature_extractor(source)
File "/home/gadget/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/gadget/.local/lib/python3.10/site-packages/fairseq/models/wav2vec/wav2vec2.py", line 895, in forward
x = conv(x)
File "/home/gadget/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/gadget/.local/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/home/gadget/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/gadget/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 313, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/gadget/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 309, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same
I have verified the variables I'm inputting into my function are of the same values as when I run the Web UI, except for file paths. I am in over my head at this point. Can someone provide guidance?
model_path="weights/oblivion_guard_v2.pth"
device="cuda:0"
is_half=True
version = "v1"
#####
#####
#####
def vc_single_new(sid,
input_audio_path,
f0_up_key,
f0_file,
f0_method,
file_index,
file_index2,
index_rate,
filter_radius,
resample_sr,
rms_mix_rate,
): #
global tgt_sr, net_g, vc, hubert_model, version
if input_audio_path is None:
raise TypeError("Please provide a file to convert")
f0_up_key = int(f0_up_key)
audio = load_audio(input_audio_path, 16000)
audio_max = np.abs(audio).max() / 0.95
if audio_max > 1:
audio /= audio_max
times = [0, 0, 0]
if hubert_model == None:
load_hubert()
if_f0 = cpt.get("f0", 1)
file_index = (
(
file_index.strip(" ")
.strip('"')
.strip("\n")
.strip('"')
.strip(" ")
.replace("trained", "added")
)
if file_index != ""
else file_index2
)
audio_opt = vc.pipeline(
hubert_model,
net_g,
sid,
audio,
input_audio_path,
times,
f0_up_key,
f0_method,
file_index,
index_rate,
if_f0,
filter_radius,
tgt_sr,
resample_sr,
rms_mix_rate,
version,
f0_file,
)
return audio_opt
#####
#####
#####
wav_opt = vc_single_new(0, input_audio_path = "example.wav", f0_up_key=0,f0_file=None,f0_method="harvest",file_index="",file_index2="",index_rate=1.0, filter_radius=3, resample_sr=0, rms_mix_rate=1.0)
wavfile.write("example_processed.wav", tgt_sr, wav_opt)
@samirdigital
I've developed a script that is working now with my version. I compared it against infer-web.py and its now working.
I run the following example command in Ubuntu.
python myinferer.py -6 "example.wav" "example_processed.wav" "weights/oblivion_guard_v2.pth" "cuda:0" "False" "harvest" "logs/oblivion_guard_v2/added_IVF2892_Flat_nprobe_1_v1.index" "" 1 3 0 1.0
I'm not a listed collaborator on this project. I'm going to paste my script here as I can't propose a pull request. Note that I did change the order of the commands to be a little cleaner. I think this was going to break existing configurations regardless, as more inputs are now required.
@fumiama if you approve I'd like to update the English documentation to reflect this script.
import os,sys,pdb,torch
now_dir = os.getcwd()
sys.path.append(now_dir)
import argparse
import glob
import sys
import torch
import numpy as np
from multiprocessing import cpu_count
class Config:
def __init__(self,device,is_half):
self.device = device
self.is_half = is_half
self.n_cpu = 0
self.gpu_name = None
self.gpu_mem = None
self.x_pad, self.x_query, self.x_center, self.x_max = self.device_config()
def device_config(self) -> tuple:
if torch.cuda.is_available():
i_device = int(self.device.split(":")[-1])
self.gpu_name = torch.cuda.get_device_name(i_device)
if (
("16" in self.gpu_name and "V100" not in self.gpu_name.upper())
or "P40" in self.gpu_name.upper()
or "1060" in self.gpu_name
or "1070" in self.gpu_name
or "1080" in self.gpu_name
):
print("16系/10系显卡和P40强制单精度")
self.is_half = False
for config_file in ["32k.json", "40k.json", "48k.json"]:
with open(f"configs/{config_file}", "r") as f:
strr = f.read().replace("true", "false")
with open(f"configs/{config_file}", "w") as f:
f.write(strr)
with open("trainset_preprocess_pipeline_print.py", "r") as f:
strr = f.read().replace("3.7", "3.0")
with open("trainset_preprocess_pipeline_print.py", "w") as f:
f.write(strr)
else:
self.gpu_name = None
self.gpu_mem = int(
torch.cuda.get_device_properties(i_device).total_memory
/ 1024
/ 1024
/ 1024
+ 0.4
)
if self.gpu_mem <= 4:
with open("trainset_preprocess_pipeline_print.py", "r") as f:
strr = f.read().replace("3.7", "3.0")
with open("trainset_preprocess_pipeline_print.py", "w") as f:
f.write(strr)
elif torch.backends.mps.is_available():
print("没有发现支持的N卡, 使用MPS进行推理")
self.device = "mps"
else:
print("没有发现支持的N卡, 使用CPU进行推理")
self.device = "cpu"
self.is_half = True
if self.n_cpu == 0:
self.n_cpu = cpu_count()
if self.is_half:
# 6G显存配置
x_pad = 3
x_query = 10
x_center = 60
x_max = 65
else:
# 5G显存配置
x_pad = 1
x_query = 6
x_center = 38
x_max = 41
if self.gpu_mem != None and self.gpu_mem <= 4:
x_pad = 1
x_query = 5
x_center = 30
x_max = 32
return x_pad, x_query, x_center, x_max
f0_up_key=int(sys.argv[1]) #transpose value
input_path=sys.argv[2]
opt_path=sys.argv[3]
model_path=sys.argv[4]
device=sys.argv[5]
is_half=sys.argv[6]
f0method=sys.argv[7] #pm or harvest
file_index=sys.argv[8] #.index file
file_index2=sys.argv[9]
index_rate=float(sys.argv[10]) #search feature ratio
filter_radius=float(sys.argv[11]) #median filter
resample_sr=float(sys.argv[12]) #resample audio in post processing
rms_mix_rate=float(sys.argv[13]) #search feature
print(sys.argv)
if(is_half.lower() == 'true'):
is_half = True
else:
is_half = False
config=Config(device,is_half)
now_dir=os.getcwd()
sys.path.append(now_dir)
from vc_infer_pipeline import VC
from infer_pack.models import SynthesizerTrnMs256NSFsid, SynthesizerTrnMs256NSFsid_nono
from my_utils import load_audio
from fairseq import checkpoint_utils
from scipy.io import wavfile
hubert_model=None
def load_hubert():
global hubert_model
models, _, _ = checkpoint_utils.load_model_ensemble_and_task(
["hubert_base.pt"],
suffix="",
)
hubert_model = models[0]
hubert_model = hubert_model.to(config.device)
if config.is_half:
hubert_model = hubert_model.half()
else:
hubert_model = hubert_model.float()
hubert_model.eval()
def vc_single(
sid=0,
input_audio_path=None,
f0_up_key=0,
f0_file=None,
f0_method="pm",
file_index="", #.index file
file_index2="",
# file_big_npy,
index_rate=1.0,
filter_radius=3,
resample_sr=0,
rms_mix_rate=1.0,
):
global tgt_sr, net_g, vc, hubert_model, version
if input_audio_path is None:
return "You need to upload an audio file", None
f0_up_key = int(f0_up_key)
audio = load_audio(input_audio_path, 16000)
audio_max = np.abs(audio).max() / 0.95
if audio_max > 1:
audio /= audio_max
times = [0, 0, 0]
if hubert_model == None:
load_hubert()
if_f0 = cpt.get("f0", 1)
file_index = (
(
file_index.strip(" ")
.strip('"')
.strip("\n")
.strip('"')
.strip(" ")
.replace("trained", "added")
)
if file_index != ""
else file_index2
)
audio_opt = vc.pipeline(
hubert_model,
net_g,
sid,
audio,
input_audio_path,
times,
f0_up_key,
f0_method,
file_index,
# file_big_npy,
index_rate,
if_f0,
filter_radius,
tgt_sr,
resample_sr,
rms_mix_rate,
version,
f0_file=f0_file,
)
return audio_opt
def get_vc(model_path):
global n_spk,tgt_sr,net_g,vc,cpt,device,is_half, version
print("loading pth %s"%model_path)
cpt = torch.load(model_path, map_location="cpu")
tgt_sr = cpt["config"][-1]
cpt["config"][-3]=cpt["weight"]["emb_g.weight"].shape[0]#n_spk
if_f0=cpt.get("f0",1)
version = cpt.get("version", "v1")
if(if_f0==1):
net_g = SynthesizerTrnMs256NSFsid(*cpt["config"], is_half=is_half)
else:
net_g = SynthesizerTrnMs256NSFsid_nono(*cpt["config"])
del net_g.enc_q
print(net_g.load_state_dict(cpt["weight"], strict=False))
net_g.eval().to(device)
if (is_half):net_g = net_g.half()
else:net_g = net_g.float()
vc = VC(tgt_sr, config)
n_spk=cpt["config"][-3]
# return {"visible": True,"maximum": n_spk, "__type__": "update"}
get_vc(model_path)
wav_opt=vc_single(0,input_path,f0_up_key,None,f0method,file_index,file_index2,index_rate,filter_radius,resample_sr,rms_mix_rate)
wavfile.write(opt_path, tgt_sr, wav_opt)
Hi guys, I was trying to use from the command prompt, and I also find a a problem the myinferer.py from sethtallen.
Probably due some new funcionality update, I also made some analisys and filled with the newer infererer based on sethtallen snippet. Now is running for the current version.
The call would be the same but including the protect variable in the end:
I run the following example command in Ubuntu. python myinferer.py -6 "example.wav" "example_processed.wav" "weights/oblivion_guard_v2.pth" "cuda:0" "False" "harvest" "logs/oblivion_guard_v2/added_IVF2892_Flat_nprobe_1_v1.index" "" 1 3 0 1.0 0.38
Feel free to include that in the project.
Kr,
Otavio
import os,sys,pdb,torch now_dir = os.getcwd() sys.path.append(now_dir) import argparse import glob import sys import torch import numpy as np from multiprocessing import cpu_count
from infer_pack.models import ( SynthesizerTrnMs256NSFsid, SynthesizerTrnMs256NSFsid_nono, SynthesizerTrnMs768NSFsid, SynthesizerTrnMs768NSFsid_nono, )
class Config: def init(self,device,is_half): self.device = device self.is_half = is_half self.n_cpu = 0 self.gpu_name = None self.gpu_mem = None self.x_pad, self.x_query, self.x_center, self.x_max = self.device_config()
def device_config(self) -> tuple:
if torch.cuda.is_available():
i_device = int(self.device.split(":")[-1])
self.gpu_name = torch.cuda.get_device_name(i_device)
if (
("16" in self.gpu_name and "V100" not in self.gpu_name.upper())
or "P40" in self.gpu_name.upper()
or "1060" in self.gpu_name
or "1070" in self.gpu_name
or "1080" in self.gpu_name
):
print("16系/10系显卡和P40强制单精度")
self.is_half = False
for config_file in ["32k.json", "40k.json", "48k.json"]:
with open(f"configs/{config_file}", "r") as f:
strr = f.read().replace("true", "false")
with open(f"configs/{config_file}", "w") as f:
f.write(strr)
with open("trainset_preprocess_pipeline_print.py", "r") as f:
strr = f.read().replace("3.7", "3.0")
with open("trainset_preprocess_pipeline_print.py", "w") as f:
f.write(strr)
else:
self.gpu_name = None
self.gpu_mem = int(
torch.cuda.get_device_properties(i_device).total_memory
/ 1024
/ 1024
/ 1024
+ 0.4
)
if self.gpu_mem <= 4:
with open("trainset_preprocess_pipeline_print.py", "r") as f:
strr = f.read().replace("3.7", "3.0")
with open("trainset_preprocess_pipeline_print.py", "w") as f:
f.write(strr)
elif torch.backends.mps.is_available():
print("没有发现支持的N卡, 使用MPS进行推理")
self.device = "mps"
else:
print("没有发现支持的N卡, 使用CPU进行推理")
self.device = "cpu"
self.is_half = True
if self.n_cpu == 0:
self.n_cpu = cpu_count()
if self.is_half:
# 6G显存配置
x_pad = 3
x_query = 10
x_center = 60
x_max = 65
else:
# 5G显存配置
x_pad = 1
x_query = 6
x_center = 38
x_max = 41
if self.gpu_mem != None and self.gpu_mem <= 4:
x_pad = 1
x_query = 5
x_center = 30
x_max = 32
return x_pad, x_query, x_center, x_max
f0_up_key=int(sys.argv[1]) #transpose value input_path=sys.argv[2] opt_path=sys.argv[3] model_path=sys.argv[4] device=sys.argv[5] is_half=sys.argv[6] f0method=sys.argv[7] #pm or harvest file_index=sys.argv[8] #.index file file_index2=sys.argv[9] index_rate=float(sys.argv[10]) #search feature ratio filter_radius=float(sys.argv[11]) #median filter resample_sr=float(sys.argv[12]) #resample audio in post processing rms_mix_rate=float(sys.argv[13]) #search feature protect=float(sys.argv[14]) # protect audio
print(sys.argv)
if(is_half.lower() == 'true'): is_half = True else: is_half = False
config=Config(device,is_half) now_dir=os.getcwd() sys.path.append(now_dir) from vc_infer_pipeline import VC from infer_pack.models import SynthesizerTrnMs256NSFsid, SynthesizerTrnMs256NSFsid_nono from my_utils import load_audio from fairseq import checkpoint_utils from scipy.io import wavfile
hubert_model=None def load_hubert(): global hubert_model models, _, _ = checkpoint_utils.load_model_ensemble_and_task( ["hubert_base.pt"], suffix="", ) hubert_model = models[0] hubert_model = hubert_model.to(config.device) if config.is_half: hubert_model = hubert_model.half() else: hubert_model = hubert_model.float() hubert_model.eval()
def vc_single( sid=0, input_audio_path=None, f0_up_key=0, f0_file=None, f0_method="pm", file_index="", #.index file file_index2="", # file_big_npy, index_rate=1.0, filter_radius=3, resample_sr=0, rms_mix_rate=1.0, protect=0.38 ): global tgt_sr, net_g, vc, hubert_model, version if input_audio_path is None: return "You need to upload an audio file", None
f0_up_key = int(f0_up_key)
audio = load_audio(input_audio_path, 16000)
audio_max = np.abs(audio).max() / 0.95
if audio_max > 1:
audio /= audio_max
times = [0, 0, 0]
if hubert_model == None:
load_hubert()
if_f0 = cpt.get("f0", 1)
file_index = (
(
file_index.strip(" ")
.strip('"')
.strip("\n")
.strip('"')
.strip(" ")
.replace("trained", "added")
)
if file_index != ""
else file_index2
)
audio_opt = vc.pipeline(
hubert_model,
net_g,
sid,
audio,
input_audio_path,
times,
f0_up_key,
f0_method,
file_index,
# file_big_npy,
index_rate,
if_f0,
filter_radius,
tgt_sr,
resample_sr,
rms_mix_rate,
version,
f0_file=f0_file,
protect = protect
)
return audio_opt
def get_vc(model_path): global n_spk, tgt_sr, net_g, vc, cpt, version if model_path == "" or model_path == []: global hubert_model if hubert_model != None: print("clean_empty_cache") del net_g, n_spk, vc, hubert_model, tgt_sr hubert_model = net_g = n_spk = vc = hubert_model = tgt_sr = None if torch.cuda.is_available(): torch.cuda.empty_cache() if_f0 = cpt.get("f0", 1) version = cpt.get("version", "v1") if version == "v1": if if_f0 == 1: net_g = SynthesizerTrnMs256NSFsid( *cpt["config"], is_half=config.is_half ) else: net_g = SynthesizerTrnMs256NSFsid_nono(*cpt["config"]) elif version == "v2": if if_f0 == 1: net_g = SynthesizerTrnMs768NSFsid( *cpt["config"], is_half=config.is_half ) else: net_g = SynthesizerTrnMs768NSFsid_nono(*cpt["config"]) del net_g, cpt if torch.cuda.is_available(): torch.cuda.empty_cache() cpt = None return {"visible": False, "type": "update"}
print("loading %s" % model_path) # Changed 'person' to 'model_path'
cpt = torch.load(model_path, map_location="cpu") # Changed 'person' to 'model_path'
tgt_sr = cpt["config"][-1]
cpt["config"][-3] = cpt["weight"]["emb_g.weight"].shape[0]
if_f0 = cpt.get("f0", 1)
version = cpt.get("version", "v1")
if version == "v1":
if if_f0 == 1:
net_g = SynthesizerTrnMs256NSFsid(*cpt["config"], is_half=config.is_half)
else:
net_g = SynthesizerTrnMs256NSFsid_nono(*cpt["config"])
elif version == "v2":
if if_f0 == 1:
net_g = SynthesizerTrnMs768NSFsid(*cpt["config"], is_half=config.is_half)
else:
net_g = SynthesizerTrnMs768NSFsid_nono(*cpt["config"])
del net_g.enc_q
print(net_g.load_state_dict(cpt["weight"], strict=False))
net_g.eval().to(config.device)
if config.is_half:
net_g = net_g.half()
else:
net_g = net_g.float()
vc = VC(tgt_sr, config)
n_spk = cpt["config"][-3]
return {"visible": True, "maximum": n_spk, "__type__": "update"}
get_vc(model_path) wav_opt=vc_single(0,input_path,f0_up_key,None,f0method,file_index,file_index2,index_rate,filter_radius,resample_sr,rms_mix_rate,protect) wavfile.write(opt_path, tgt_sr, wav_opt)
Got this error TypeError: VC.pipeline() got multiple values for argument 'f0_file' My command is python myinfer1.py -6 "/workspace/audio/amit.mp3" "/workspace/audio/example_processed.wav" "weights/amit.pth" "cuda:0" "False" "harvest" "logs/oblivion_guard_v2/added_IVF2892_Flat_nprobe_1_v1.index" " " 1 3 0 1.0 0.38 I took this code . import os import sys import pdb import torch import numpy as np from multiprocessing import cpu_count from infer_pack.models import SynthesizerTrnMs256NSFsid, SynthesizerTrnMs256NSFsid_nono from infer_pack.models import ( SynthesizerTrnMs256NSFsid, SynthesizerTrnMs256NSFsid_nono, #SynthesizerTrnMs768NSFsid, #SynthesizerTrnMs768NSFsid_nono, )
class Config: def init(self, device, is_half): self.device = device self.is_half = is_half self.n_cpu = 0 self.gpu_name = None self.gpu_mem = None self.x_pad, self.x_query, self.x_center, self.x_max = self.device_config()
def device_config(self) -> tuple:
if torch.cuda.is_available():
i_device = int(self.device.split(":")[-1])
self.gpu_name = torch.cuda.get_device_name(i_device)
if (
("16" in self.gpu_name and "V100" not in self.gpu_name.upper())
or "P40" in self.gpu_name.upper()
or "1060" in self.gpu_name
or "1070" in self.gpu_name
or "1080" in self.gpu_name
):
print("16系/10系显卡和P40强制单精度")
self.is_half = False
for config_file in ["32k.json", "40k.json", "48k.json"]:
with open(f"configs/{config_file}", "r") as f:
strr = f.read().replace("true", "false")
with open(f"configs/{config_file}", "w") as f:
f.write(strr)
with open("trainset_preprocess_pipeline_print.py", "r") as f:
strr = f.read().replace("3.7", "3.0")
with open("trainset_preprocess_pipeline_print.py", "w") as f:
f.write(strr)
else:
self.gpu_name = None
self.gpu_mem = int(
torch.cuda.get_device_properties(i_device).total_memory
/ 1024
/ 1024
/ 1024
+ 0.4
)
if self.gpu_mem <= 4:
with open("trainset_preprocess_pipeline_print.py", "r") as f:
strr = f.read().replace("3.7", "3.0")
with open("trainset_preprocess_pipeline_print.py", "w") as f:
f.write(strr)
elif torch.backends.mps.is_available():
print("没有发现支持的N卡, 使用MPS进行推理")
self.device = "mps"
else:
print("没有发现支持的N卡, 使用CPU进行推理")
self.device = "cpu"
self.is_half = True
if self.n_cpu == 0:
self.n_cpu = cpu_count()
if self.is_half:
# 6G显存配置
x_pad = 3
x_query = 10
x_center = 60
x_max = 65
else:
# 5G显存配置
x_pad = 1
x_query = 6
x_center = 38
x_max = 41
if self.gpu_mem != None and self.gpu_mem <= 4:
x_pad = 1
x_query = 5
x_center = 30
x_max = 32
return x_pad, x_query, x_center, x_max
f0_up_key = int(sys.argv[1]) # transpose value input_path = sys.argv[2] opt_path = sys.argv[3] model_path = sys.argv[4] device = sys.argv[5] is_half = sys.argv[6] f0method = sys.argv[7] # pm or harvest file_index = sys.argv[8] # .index file file_index2 = sys.argv[9] index_rate = float(sys.argv[10]) # search feature ratio filter_radius = float(sys.argv[11]) # median filter resample_sr = float(sys.argv[12]) # resample audio in post-processing rms_mix_rate = float(sys.argv[13]) # search feature protect = float(sys.argv[14]) # protect audio
print(sys.argv)
if is_half.lower() == 'true': is_half = True else: is_half = False
config = Config(device, is_half) now_dir = os.getcwd() sys.path.append(now_dir) from vc_infer_pipeline import VC from my_utils import load_audio from fairseq import checkpoint_utils from scipy.io import wavfile
hubert_model = None
def load_hubert(): global hubert_model models, _, _ = checkpoint_utils.load_model_ensemble_and_task( ["hubert_base.pt"], suffix="", ) hubert_model = models[0] hubert_model = hubert_model.to(config.device) if config.is_half: hubert_model = hubert_model.half() else: hubert_model = hubert_model.float() hubert_model.eval()
def vc_single( sid=0, input_audio_path=None, f0_up_key=0, f0_file=None, f0_method="pm", file_index="", # .index file file_index2="", # file_big_npy, index_rate=1.0, filter_radius=3, resample_sr=0, rms_mix_rate=1.0, protect=0.38 ): global tgt_sr, net_g, vc, hubert_model, version if input_audio_path is None: return "You need to upload an audio file", None
f0_up_key = int(f0_up_key)
audio = load_audio(input_audio_path, 16000)
audio_max = np.abs(audio).max() / 0.95
if audio_max > 1:
audio /= audio_max
times = [0, 0, 0]
if hubert_model is None:
load_hubert()
if_f0 = cpt.get("f0", 1)
file_index = (
file_index.strip(" ")
.strip('"')
.strip("\n")
.strip('"')
.strip(" ")
.replace("trained", "added")
if file_index != ""
else file_index2
)
audio_opt = vc.pipeline(
hubert_model,
net_g,
sid,
audio,
input_audio_path,
times,
f0_up_key,
f0_method,
file_index,
# file_big_npy,
index_rate,
if_f0,
filter_radius,
tgt_sr,
resample_sr,
rms_mix_rate,
version,
f0_file=f0_file,
protect=protect
)
return audio_opt
def get_vc(model_path): global n_spk, tgt_sr, net_g, vc, cpt, version if model_path == "" or model_path == []: global hubert_model if hubert_model is not None: print("clean_empty_cache") del net_g, n_spk, vc, hubert_model, tgt_sr hubert_model = net_g = n_spk = vc = hubert_model = tgt_sr = None if torch.cuda.is_available(): torch.cuda.empty_cache() if_f0 = cpt.get("f0", 1) version = cpt.get("version", "v1") if version == "v1": if if_f0 == 1: net_g = SynthesizerTrnMs256NSFsid( *cpt["config"], is_half=config.is_half ) else: net_g = SynthesizerTrnMs256NSFsid_nono(*cpt["config"]) #elif version == "v2": # if if_f0 == 1: # net_g = SynthesizerTrnMs768NSFsid( # *cpt["config"], is_half=config.is_half # ) #else: # net_g = SynthesizerTrnMs768NSFsid_nono(*cpt["config"]) del net_g, cpt if torch.cuda.is_available(): torch.cuda.empty_cache() cpt = None return {"visible": False, "type": "update"}
print("loading %s" % model_path)
cpt = torch.load(model_path, map_location="cpu")
tgt_sr = cpt["config"][-1]
cpt["config"][-3] = cpt["weight"]["emb_g.weight"].shape[0]
if_f0 = cpt.get("f0", 1)
version = cpt.get("version", "v1")
if version == "v1":
if if_f0 == 1:
net_g = SynthesizerTrnMs256NSFsid(*cpt["config"], is_half=config.is_half)
else:
net_g = SynthesizerTrnMs256NSFsid_nono(*cpt["config"])
#elif version == "v2":
# if if_f0 == 1:
# net_g = SynthesizerTrnMs768NSFsid(*cpt["config"], is_half=config.is_half)
# else:
# net_g = SynthesizerTrnMs768NSFsid_nono(*cpt["config"])
del net_g.enc_q
print(net_g.load_state_dict(cpt["weight"], strict=False))
net_g.eval().to(config.device)
if config.is_half:
net_g = net_g.half()
else:
net_g = net_g.float()
vc = VC(tgt_sr, config)
n_spk = cpt["config"][-3]
return {"visible": True, "maximum": n_spk, "__type__": "update"}
get_vc(model_path) wav_opt = vc_single( 0, input_path, f0_up_key, None, f0method, file_index, file_index2, index_rate, filter_radius, resample_sr, rms_mix_rate, protect ) wavfile.write(opt_path, tgt_sr, wav_opt)
@allthingssecurity I've actually started another repo with this file I'm maintaining. I think they updated it. I just recently used this so I know it works on the newest version. Let me know if it works for you.
https://github.com/sethtallen/RBVC_CLI_TOOL
Thanks but Still doesnt work for me.. Here is the output of the python file I used from your repo. !python myinfer.py 0 "/content/drive/MyDrive/jain/audio/kannada1.mp3" "/content/drive/MyDrive/jain/audio/smj_kannada.wav" "/content/Retrieval-based-Voice-Conversion-WebUI/weights/smj.pth" "" "cuda:0" "pm"
['myinfer.py', '0', '/content/drive/MyDrive/jain/audio/kannada1.mp3', '/content/drive/MyDrive/jain/audio/smj_kannada.wav', '/content/Retrieval-based-Voice-Conversion-WebUI/weights/smj.pth', '', 'cuda:0', 'pm']
2023-07-26 02:49:10.497965: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-07-26 02:49:11.545609: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-07-26 02:49:13 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
loading pth /content/Retrieval-based-Voice-Conversion-WebUI/weights/smj.pth
gin_channels: 256 self.spk_embed_dim: 109
<All keys matched successfully>
2023-07-26 02:49:18 | INFO | fairseq.tasks.hubert_pretraining | current directory is /content/Retrieval-based-Voice-Conversion-WebUI
2023-07-26 02:49:18 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
2023-07-26 02:49:18 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
Traceback (most recent call last):
File "/content/Retrieval-based-Voice-Conversion-WebUI/myinfer.py", line 230, in
Are you running this from Colab? I've not done that. I've attempted to reproduce, but in order to access the terminal within Colab, I have to have a membership unfortunately. I think this is a Colab unique problem. I've tested the script against the newest version before and didn't have an issue.
Also, I am guessing you replaced the contents of myinfer.py with the infer_cli.py file? You are calling myinfer.py
f0 file should equal 'None', as by default when calling the function, within my script. I do not really understand how its getting multiple values. If you have copied the contents of infer_cli.py to myinfer.py, I can give you some instruction on how we can possibly troubleshoot this.
I think I fixed issue. Had to map the parameters exactly. Made these changes to make it work audio_opt = vc.pipeline( hubert_model, net_g, sid, audio, #input_audio_path, times, f0_up_key, f0_method, file_index,
index_rate,
if_f0,
#filter_radius,
#tgt_sr,
#resample_sr,
#rms_mix_rate,
#version,
f0_file=f0_file,
#protect=protect
)
Thanks a lot anyways. Now it works in colab as well as other places
@sethtallen Hi, Seth. can you please explain the arguments? [TRANSPOSE_VALUE] "[INPUT_PATH]" "[OUTPUT_PATH]" "[MODEL_PATH]" "[INDEX_FILE_PATH]" "[INFERENCE_DEVICE]" "[METHOD]"
@akkharolia Transpose = transpose
Input and output is the audio file you're converting.
Model path is in your weights folder. Index file path is the .index file you find in logs/[model_name]/*.index.
Inference device is either CPU or whatever index you have for GPU (cuda:0)
Method is either pm, harvest, or crepe.
So, there’s no way to train the RVC on the voice and create a model using CLI?
On Thu, 27 Jul 2023 at 21:47, Seth T. Allen @.***> wrote:
@akkharolia https://github.com/akkharolia Transpose = transpose
Input and output is the audio file you're converting.
Model path is in your weights folder. Index file path is the .index file you find in logs/[model_name]/*.index.
Inference device is either CPU or whatever index you have for GPU (cuda:0)
Method is either pm, harvest, or crepe.
— Reply to this email directly, view it on GitHub https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/299#issuecomment-1653935876, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDEBCSSPU4IHWXVKKIN6VLXSKIB3ANCNFSM6AAAAAAYEQHGDQ . You are receiving this because you were mentioned.Message ID: <RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/299/1653935876@ github.com>
The script I've committed is only for inference, not for training. I do not know if there's currently a script for training a model via CLI, but it would be a worthwhile one to add, I would reckon.
Thanks for updating @sethtallen. I'll try builiding for a training script.
Let me know if you would like help, I am willing to assist. My email is [email protected]
@allthingssecurity I've actually started another repo with this file I'm maintaining. I think they updated it. I just recently used this so I know it works on the newest version. Let me know if it works for you.
https://github.com/sethtallen/RBVC_CLI_TOOL
Page not found :)