Retrieval-based-Voice-Conversion-WebUI myinfer.py has not been updated

There are new arguments added to infer-web.py to the vc_single function that myinfer.py uses. Command line usage of RVC doesn't seem to work because of this mismatch in parameters.

May 17 '23 04:05 samirdigital

I noticed this and tried working on this by manually plugging in the missing variables into the function. I received a "Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same" error.

I tried rewriting the function by referencing what was written in infer-web.py vc_single() I received the same error.

Traceback (most recent call last):
  File "/home/gadget/Retrieval-based-Voice-Conversion-WebUI/myinfer.py", line 247, in <module>
    wavfile.write(opt_path, tgt_sr, wav_opt)
  File "/home/gadget/Retrieval-based-Voice-Conversion-WebUI/myinfer.py", line 202, in vc_single_new
    hubert_model,
  File "/home/gadget/Retrieval-based-Voice-Conversion-WebUI/vc_infer_pipeline.py", line 313, in pipeline
    self.vc(
  File "/home/gadget/Retrieval-based-Voice-Conversion-WebUI/vc_infer_pipeline.py", line 163, in vc
    logits = model.extract_features(**inputs)
  File "/home/gadget/.local/lib/python3.10/site-packages/fairseq/models/hubert/hubert.py", line 535, in extract_features
    res = self.forward(
  File "/home/gadget/.local/lib/python3.10/site-packages/fairseq/models/hubert/hubert.py", line 437, in forward
    features = self.forward_features(source)
  File "/home/gadget/.local/lib/python3.10/site-packages/fairseq/models/hubert/hubert.py", line 392, in forward_features
    features = self.feature_extractor(source)
  File "/home/gadget/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/gadget/.local/lib/python3.10/site-packages/fairseq/models/wav2vec/wav2vec2.py", line 895, in forward
    x = conv(x)
  File "/home/gadget/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/gadget/.local/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
  File "/home/gadget/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/gadget/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 313, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/gadget/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 309, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

I have verified the variables I'm inputting into my function are of the same values as when I run the Web UI, except for file paths. I am in over my head at this point. Can someone provide guidance?

model_path="weights/oblivion_guard_v2.pth"
device="cuda:0"
is_half=True
version = "v1"

#####
#####
#####

def vc_single_new(sid,
    input_audio_path,
    f0_up_key,
    f0_file,
    f0_method,
    file_index,
    file_index2,
    index_rate,
    filter_radius,
    resample_sr,
    rms_mix_rate,
):  #
    global tgt_sr, net_g, vc, hubert_model, version

    if input_audio_path is None:
        raise TypeError("Please provide a file to convert")
        
    f0_up_key = int(f0_up_key)
    audio = load_audio(input_audio_path, 16000)
    audio_max = np.abs(audio).max() / 0.95
    if audio_max > 1:
        audio /= audio_max
    times = [0, 0, 0]
    
    if hubert_model == None:
        load_hubert()

    if_f0 = cpt.get("f0", 1)

    file_index = (
        (
            file_index.strip(" ")
            .strip('"')
            .strip("\n")
            .strip('"')
            .strip(" ")
            .replace("trained", "added")
        )
        if file_index != ""
        else file_index2
    )
    
    audio_opt = vc.pipeline(
            hubert_model,
            net_g,
            sid,
            audio,
            input_audio_path,
            times,
            f0_up_key,
            f0_method,
            file_index,
            index_rate,
            if_f0,
            filter_radius,
            tgt_sr,
            resample_sr,
            rms_mix_rate,
            version,
            f0_file,
    )
    return audio_opt

#####
#####
#####

wav_opt = vc_single_new(0, input_audio_path = "example.wav", f0_up_key=0,f0_file=None,f0_method="harvest",file_index="",file_index2="",index_rate=1.0, filter_radius=3, resample_sr=0, rms_mix_rate=1.0)
wavfile.write("example_processed.wav", tgt_sr, wav_opt)

May 24 '23 21:05 sethtallen

@samirdigital

I've developed a script that is working now with my version. I compared it against infer-web.py and its now working.

I run the following example command in Ubuntu. python myinferer.py -6 "example.wav" "example_processed.wav" "weights/oblivion_guard_v2.pth" "cuda:0" "False" "harvest" "logs/oblivion_guard_v2/added_IVF2892_Flat_nprobe_1_v1.index" "" 1 3 0 1.0

I'm not a listed collaborator on this project. I'm going to paste my script here as I can't propose a pull request. Note that I did change the order of the commands to be a little cleaner. I think this was going to break existing configurations regardless, as more inputs are now required.

@fumiama if you approve I'd like to update the English documentation to reflect this script.

import os,sys,pdb,torch
now_dir = os.getcwd()
sys.path.append(now_dir)
import argparse
import glob
import sys
import torch
import numpy as np
from multiprocessing import cpu_count
class Config:
    def __init__(self,device,is_half):
        self.device = device
        self.is_half = is_half
        self.n_cpu = 0
        self.gpu_name = None
        self.gpu_mem = None
        self.x_pad, self.x_query, self.x_center, self.x_max = self.device_config()

    def device_config(self) -> tuple:
        if torch.cuda.is_available():
            i_device = int(self.device.split(":")[-1])
            self.gpu_name = torch.cuda.get_device_name(i_device)
            if (
                ("16" in self.gpu_name and "V100" not in self.gpu_name.upper())
                or "P40" in self.gpu_name.upper()
                or "1060" in self.gpu_name
                or "1070" in self.gpu_name
                or "1080" in self.gpu_name
            ):
                print("16系/10系显卡和P40强制单精度")
                self.is_half = False
                for config_file in ["32k.json", "40k.json", "48k.json"]:
                    with open(f"configs/{config_file}", "r") as f:
                        strr = f.read().replace("true", "false")
                    with open(f"configs/{config_file}", "w") as f:
                        f.write(strr)
                with open("trainset_preprocess_pipeline_print.py", "r") as f:
                    strr = f.read().replace("3.7", "3.0")
                with open("trainset_preprocess_pipeline_print.py", "w") as f:
                    f.write(strr)
            else:
                self.gpu_name = None
            self.gpu_mem = int(
                torch.cuda.get_device_properties(i_device).total_memory
                / 1024
                / 1024
                / 1024
                + 0.4
            )
            if self.gpu_mem <= 4:
                with open("trainset_preprocess_pipeline_print.py", "r") as f:
                    strr = f.read().replace("3.7", "3.0")
                with open("trainset_preprocess_pipeline_print.py", "w") as f:
                    f.write(strr)
        elif torch.backends.mps.is_available():
            print("没有发现支持的N卡, 使用MPS进行推理")
            self.device = "mps"
        else:
            print("没有发现支持的N卡, 使用CPU进行推理")
            self.device = "cpu"
            self.is_half = True

        if self.n_cpu == 0:
            self.n_cpu = cpu_count()

        if self.is_half:
            # 6G显存配置
            x_pad = 3
            x_query = 10
            x_center = 60
            x_max = 65
        else:
            # 5G显存配置
            x_pad = 1
            x_query = 6
            x_center = 38
            x_max = 41

        if self.gpu_mem != None and self.gpu_mem <= 4:
            x_pad = 1
            x_query = 5
            x_center = 30
            x_max = 32

        return x_pad, x_query, x_center, x_max

f0_up_key=int(sys.argv[1]) #transpose value
input_path=sys.argv[2]
opt_path=sys.argv[3]
model_path=sys.argv[4]
device=sys.argv[5]
is_half=sys.argv[6]
f0method=sys.argv[7] #pm or harvest
file_index=sys.argv[8] #.index file
file_index2=sys.argv[9] 
index_rate=float(sys.argv[10]) #search feature ratio
filter_radius=float(sys.argv[11]) #median filter
resample_sr=float(sys.argv[12]) #resample audio in post processing
rms_mix_rate=float(sys.argv[13]) #search feature
print(sys.argv)

if(is_half.lower() == 'true'):
    is_half = True
else:
    is_half = False

config=Config(device,is_half)
now_dir=os.getcwd()
sys.path.append(now_dir)
from vc_infer_pipeline import VC
from infer_pack.models import SynthesizerTrnMs256NSFsid, SynthesizerTrnMs256NSFsid_nono
from my_utils import load_audio
from fairseq import checkpoint_utils
from scipy.io import wavfile

hubert_model=None
def load_hubert():
    global hubert_model
    models, _, _ = checkpoint_utils.load_model_ensemble_and_task(
        ["hubert_base.pt"],
        suffix="",
    )
    hubert_model = models[0]
    hubert_model = hubert_model.to(config.device)
    if config.is_half:
        hubert_model = hubert_model.half()
    else:
        hubert_model = hubert_model.float()
    hubert_model.eval()

def vc_single(
    sid=0,
    input_audio_path=None,
    f0_up_key=0, 
    f0_file=None,
    f0_method="pm", 
    file_index="", #.index file
    file_index2="",
    # file_big_npy,
    index_rate=1.0, 
    filter_radius=3, 
    resample_sr=0, 
    rms_mix_rate=1.0, 
):
    global tgt_sr, net_g, vc, hubert_model, version
    if input_audio_path is None:
        return "You need to upload an audio file", None
    
    f0_up_key = int(f0_up_key)
    audio = load_audio(input_audio_path, 16000)
    audio_max = np.abs(audio).max() / 0.95
    
    if audio_max > 1:
        audio /= audio_max
    times = [0, 0, 0]
    
    if hubert_model == None:
        load_hubert()
    
    if_f0 = cpt.get("f0", 1)
    
    file_index = (
        (
            file_index.strip(" ")
            .strip('"')
            .strip("\n")
            .strip('"')
            .strip(" ")
            .replace("trained", "added")
        )
        if file_index != ""
        else file_index2
    )
    
    audio_opt = vc.pipeline(
        hubert_model,
        net_g,
        sid,
        audio,
        input_audio_path,
        times,
        f0_up_key,
        f0_method,
        file_index,
        # file_big_npy,
        index_rate,
        if_f0,
        filter_radius,
        tgt_sr,
        resample_sr,
        rms_mix_rate,
        version,
        f0_file=f0_file,
    )
    return audio_opt


def get_vc(model_path):
    global n_spk,tgt_sr,net_g,vc,cpt,device,is_half, version
    print("loading pth %s"%model_path)
    cpt = torch.load(model_path, map_location="cpu")
    tgt_sr = cpt["config"][-1]
    cpt["config"][-3]=cpt["weight"]["emb_g.weight"].shape[0]#n_spk
    if_f0=cpt.get("f0",1)
    version = cpt.get("version", "v1")
    if(if_f0==1):
        net_g = SynthesizerTrnMs256NSFsid(*cpt["config"], is_half=is_half)
    else:
        net_g = SynthesizerTrnMs256NSFsid_nono(*cpt["config"])
    del net_g.enc_q
    print(net_g.load_state_dict(cpt["weight"], strict=False))
    net_g.eval().to(device)
    if (is_half):net_g = net_g.half()
    else:net_g = net_g.float()
    vc = VC(tgt_sr, config)
    n_spk=cpt["config"][-3]
    # return {"visible": True,"maximum": n_spk, "__type__": "update"}

get_vc(model_path)
wav_opt=vc_single(0,input_path,f0_up_key,None,f0method,file_index,file_index2,index_rate,filter_radius,resample_sr,rms_mix_rate)
wavfile.write(opt_path, tgt_sr, wav_opt)

May 25 '23 01:05 sethtallen

Hi guys, I was trying to use from the command prompt, and I also find a a problem the myinferer.py from sethtallen.

Probably due some new funcionality update, I also made some analisys and filled with the newer infererer based on sethtallen snippet. Now is running for the current version.

The call would be the same but including the protect variable in the end:

I run the following example command in Ubuntu. python myinferer.py -6 "example.wav" "example_processed.wav" "weights/oblivion_guard_v2.pth" "cuda:0" "False" "harvest" "logs/oblivion_guard_v2/added_IVF2892_Flat_nprobe_1_v1.index" "" 1 3 0 1.0 0.38

Feel free to include that in the project.

Kr,

Otavio

Jun 12 '23 07:06 otaviobhz

import os,sys,pdb,torch now_dir = os.getcwd() sys.path.append(now_dir) import argparse import glob import sys import torch import numpy as np from multiprocessing import cpu_count

from infer_pack.models import ( SynthesizerTrnMs256NSFsid, SynthesizerTrnMs256NSFsid_nono, SynthesizerTrnMs768NSFsid, SynthesizerTrnMs768NSFsid_nono, )

class Config: def init(self,device,is_half): self.device = device self.is_half = is_half self.n_cpu = 0 self.gpu_name = None self.gpu_mem = None self.x_pad, self.x_query, self.x_center, self.x_max = self.device_config()

def device_config(self) -> tuple:
    if torch.cuda.is_available():
        i_device = int(self.device.split(":")[-1])
        self.gpu_name = torch.cuda.get_device_name(i_device)
        if (
            ("16" in self.gpu_name and "V100" not in self.gpu_name.upper())
            or "P40" in self.gpu_name.upper()
            or "1060" in self.gpu_name
            or "1070" in self.gpu_name
            or "1080" in self.gpu_name
        ):
            print("16系/10系显卡和P40强制单精度")
            self.is_half = False
            for config_file in ["32k.json", "40k.json", "48k.json"]:
                with open(f"configs/{config_file}", "r") as f:
                    strr = f.read().replace("true", "false")
                with open(f"configs/{config_file}", "w") as f:
                    f.write(strr)
            with open("trainset_preprocess_pipeline_print.py", "r") as f:
                strr = f.read().replace("3.7", "3.0")
            with open("trainset_preprocess_pipeline_print.py", "w") as f:
                f.write(strr)
        else:
            self.gpu_name = None
        self.gpu_mem = int(
            torch.cuda.get_device_properties(i_device).total_memory
            / 1024
            / 1024
            / 1024
            + 0.4
        )
        if self.gpu_mem <= 4:
            with open("trainset_preprocess_pipeline_print.py", "r") as f:
                strr = f.read().replace("3.7", "3.0")
            with open("trainset_preprocess_pipeline_print.py", "w") as f:
                f.write(strr)
    elif torch.backends.mps.is_available():
        print("没有发现支持的N卡, 使用MPS进行推理")
        self.device = "mps"
    else:
        print("没有发现支持的N卡, 使用CPU进行推理")
        self.device = "cpu"
        self.is_half = True

    if self.n_cpu == 0:
        self.n_cpu = cpu_count()

    if self.is_half:
        # 6G显存配置
        x_pad = 3
        x_query = 10
        x_center = 60
        x_max = 65
    else:
        # 5G显存配置
        x_pad = 1
        x_query = 6
        x_center = 38
        x_max = 41

    if self.gpu_mem != None and self.gpu_mem <= 4:
        x_pad = 1
        x_query = 5
        x_center = 30
        x_max = 32

    return x_pad, x_query, x_center, x_max

f0_up_key=int(sys.argv[1]) #transpose value input_path=sys.argv[2] opt_path=sys.argv[3] model_path=sys.argv[4] device=sys.argv[5] is_half=sys.argv[6] f0method=sys.argv[7] #pm or harvest file_index=sys.argv[8] #.index file file_index2=sys.argv[9] index_rate=float(sys.argv[10]) #search feature ratio filter_radius=float(sys.argv[11]) #median filter resample_sr=float(sys.argv[12]) #resample audio in post processing rms_mix_rate=float(sys.argv[13]) #search feature protect=float(sys.argv[14]) # protect audio

print(sys.argv)

if(is_half.lower() == 'true'): is_half = True else: is_half = False

config=Config(device,is_half) now_dir=os.getcwd() sys.path.append(now_dir) from vc_infer_pipeline import VC from infer_pack.models import SynthesizerTrnMs256NSFsid, SynthesizerTrnMs256NSFsid_nono from my_utils import load_audio from fairseq import checkpoint_utils from scipy.io import wavfile

hubert_model=None def load_hubert(): global hubert_model models, _, _ = checkpoint_utils.load_model_ensemble_and_task( ["hubert_base.pt"], suffix="", ) hubert_model = models[0] hubert_model = hubert_model.to(config.device) if config.is_half: hubert_model = hubert_model.half() else: hubert_model = hubert_model.float() hubert_model.eval()

def vc_single( sid=0, input_audio_path=None, f0_up_key=0, f0_file=None, f0_method="pm", file_index="", #.index file file_index2="", # file_big_npy, index_rate=1.0, filter_radius=3, resample_sr=0, rms_mix_rate=1.0, protect=0.38 ): global tgt_sr, net_g, vc, hubert_model, version if input_audio_path is None: return "You need to upload an audio file", None

f0_up_key = int(f0_up_key)
audio = load_audio(input_audio_path, 16000)
audio_max = np.abs(audio).max() / 0.95

if audio_max > 1:
    audio /= audio_max
times = [0, 0, 0]

if hubert_model == None:
    load_hubert()

if_f0 = cpt.get("f0", 1)

file_index = (
    (
        file_index.strip(" ")
        .strip('"')
        .strip("\n")
        .strip('"')
        .strip(" ")
        .replace("trained", "added")
    )
    if file_index != ""
    else file_index2
)

audio_opt = vc.pipeline(
    hubert_model,
    net_g,
    sid,
    audio,
    input_audio_path,
    times,
    f0_up_key,
    f0_method,
    file_index,
    # file_big_npy,
    index_rate,
    if_f0,
    filter_radius,
    tgt_sr,
    resample_sr,
    rms_mix_rate,
    version,
    f0_file=f0_file,
    protect = protect 
)
return audio_opt

def get_vc(model_path): global n_spk, tgt_sr, net_g, vc, cpt, version if model_path == "" or model_path == []: global hubert_model if hubert_model != None: print("clean_empty_cache") del net_g, n_spk, vc, hubert_model, tgt_sr hubert_model = net_g = n_spk = vc = hubert_model = tgt_sr = None if torch.cuda.is_available(): torch.cuda.empty_cache() if_f0 = cpt.get("f0", 1) version = cpt.get("version", "v1") if version == "v1": if if_f0 == 1: net_g = SynthesizerTrnMs256NSFsid( *cpt["config"], is_half=config.is_half ) else: net_g = SynthesizerTrnMs256NSFsid_nono(*cpt["config"]) elif version == "v2": if if_f0 == 1: net_g = SynthesizerTrnMs768NSFsid( *cpt["config"], is_half=config.is_half ) else: net_g = SynthesizerTrnMs768NSFsid_nono(*cpt["config"]) del net_g, cpt if torch.cuda.is_available(): torch.cuda.empty_cache() cpt = None return {"visible": False, "type": "update"}

print("loading %s" % model_path)   # Changed 'person' to 'model_path'
cpt = torch.load(model_path, map_location="cpu")  # Changed 'person' to 'model_path'
tgt_sr = cpt["config"][-1]
cpt["config"][-3] = cpt["weight"]["emb_g.weight"].shape[0]  
if_f0 = cpt.get("f0", 1)
version = cpt.get("version", "v1")
if version == "v1":
    if if_f0 == 1:
        net_g = SynthesizerTrnMs256NSFsid(*cpt["config"], is_half=config.is_half)
    else:
        net_g = SynthesizerTrnMs256NSFsid_nono(*cpt["config"])
elif version == "v2":
    if if_f0 == 1:
        net_g = SynthesizerTrnMs768NSFsid(*cpt["config"], is_half=config.is_half)
    else:
        net_g = SynthesizerTrnMs768NSFsid_nono(*cpt["config"])
del net_g.enc_q
print(net_g.load_state_dict(cpt["weight"], strict=False))
net_g.eval().to(config.device)
if config.is_half:
    net_g = net_g.half()
else:
    net_g = net_g.float()
vc = VC(tgt_sr, config)
n_spk = cpt["config"][-3]
return {"visible": True, "maximum": n_spk, "__type__": "update"}

get_vc(model_path) wav_opt=vc_single(0,input_path,f0_up_key,None,f0method,file_index,file_index2,index_rate,filter_radius,resample_sr,rms_mix_rate,protect) wavfile.write(opt_path, tgt_sr, wav_opt)

Jun 12 '23 07:06 otaviobhz

Got this error TypeError: VC.pipeline() got multiple values for argument 'f0_file' My command is python myinfer1.py -6 "/workspace/audio/amit.mp3" "/workspace/audio/example_processed.wav" "weights/amit.pth" "cuda:0" "False" "harvest" "logs/oblivion_guard_v2/added_IVF2892_Flat_nprobe_1_v1.index" " " 1 3 0 1.0 0.38 I took this code . import os import sys import pdb import torch import numpy as np from multiprocessing import cpu_count from infer_pack.models import SynthesizerTrnMs256NSFsid, SynthesizerTrnMs256NSFsid_nono from infer_pack.models import ( SynthesizerTrnMs256NSFsid, SynthesizerTrnMs256NSFsid_nono, #SynthesizerTrnMs768NSFsid, #SynthesizerTrnMs768NSFsid_nono, )

class Config: def init(self, device, is_half): self.device = device self.is_half = is_half self.n_cpu = 0 self.gpu_name = None self.gpu_mem = None self.x_pad, self.x_query, self.x_center, self.x_max = self.device_config()

def device_config(self) -> tuple:
    if torch.cuda.is_available():
        i_device = int(self.device.split(":")[-1])
        self.gpu_name = torch.cuda.get_device_name(i_device)
        if (
            ("16" in self.gpu_name and "V100" not in self.gpu_name.upper())
            or "P40" in self.gpu_name.upper()
            or "1060" in self.gpu_name
            or "1070" in self.gpu_name
            or "1080" in self.gpu_name
        ):
            print("16系/10系显卡和P40强制单精度")
            self.is_half = False
            for config_file in ["32k.json", "40k.json", "48k.json"]:
                with open(f"configs/{config_file}", "r") as f:
                    strr = f.read().replace("true", "false")
                with open(f"configs/{config_file}", "w") as f:
                    f.write(strr)
            with open("trainset_preprocess_pipeline_print.py", "r") as f:
                strr = f.read().replace("3.7", "3.0")
            with open("trainset_preprocess_pipeline_print.py", "w") as f:
                f.write(strr)
        else:
            self.gpu_name = None
        self.gpu_mem = int(
            torch.cuda.get_device_properties(i_device).total_memory
            / 1024
            / 1024
            / 1024
            + 0.4
        )
        if self.gpu_mem <= 4:
            with open("trainset_preprocess_pipeline_print.py", "r") as f:
                strr = f.read().replace("3.7", "3.0")
            with open("trainset_preprocess_pipeline_print.py", "w") as f:
                f.write(strr)
    elif torch.backends.mps.is_available():
        print("没有发现支持的N卡, 使用MPS进行推理")
        self.device = "mps"
    else:
        print("没有发现支持的N卡, 使用CPU进行推理")
        self.device = "cpu"
        self.is_half = True

    if self.n_cpu == 0:
        self.n_cpu = cpu_count()

    if self.is_half:
        # 6G显存配置
        x_pad = 3
        x_query = 10
        x_center = 60
        x_max = 65
    else:
        # 5G显存配置
        x_pad = 1
        x_query = 6
        x_center = 38
        x_max = 41

    if self.gpu_mem != None and self.gpu_mem <= 4:
        x_pad = 1
        x_query = 5
        x_center = 30
        x_max = 32

    return x_pad, x_query, x_center, x_max

f0_up_key = int(sys.argv[1]) # transpose value input_path = sys.argv[2] opt_path = sys.argv[3] model_path = sys.argv[4] device = sys.argv[5] is_half = sys.argv[6] f0method = sys.argv[7] # pm or harvest file_index = sys.argv[8] # .index file file_index2 = sys.argv[9] index_rate = float(sys.argv[10]) # search feature ratio filter_radius = float(sys.argv[11]) # median filter resample_sr = float(sys.argv[12]) # resample audio in post-processing rms_mix_rate = float(sys.argv[13]) # search feature protect = float(sys.argv[14]) # protect audio

print(sys.argv)

if is_half.lower() == 'true': is_half = True else: is_half = False

config = Config(device, is_half) now_dir = os.getcwd() sys.path.append(now_dir) from vc_infer_pipeline import VC from my_utils import load_audio from fairseq import checkpoint_utils from scipy.io import wavfile

hubert_model = None

def load_hubert(): global hubert_model models, _, _ = checkpoint_utils.load_model_ensemble_and_task( ["hubert_base.pt"], suffix="", ) hubert_model = models[0] hubert_model = hubert_model.to(config.device) if config.is_half: hubert_model = hubert_model.half() else: hubert_model = hubert_model.float() hubert_model.eval()

def vc_single( sid=0, input_audio_path=None, f0_up_key=0, f0_file=None, f0_method="pm", file_index="", # .index file file_index2="", # file_big_npy, index_rate=1.0, filter_radius=3, resample_sr=0, rms_mix_rate=1.0, protect=0.38 ): global tgt_sr, net_g, vc, hubert_model, version if input_audio_path is None: return "You need to upload an audio file", None

f0_up_key = int(f0_up_key)
audio = load_audio(input_audio_path, 16000)
audio_max = np.abs(audio).max() / 0.95

if audio_max > 1:
    audio /= audio_max
times = [0, 0, 0]

if hubert_model is None:
    load_hubert()

if_f0 = cpt.get("f0", 1)

file_index = (
    file_index.strip(" ")
    .strip('"')
    .strip("\n")
    .strip('"')
    .strip(" ")
    .replace("trained", "added")
    if file_index != ""
    else file_index2
)

audio_opt = vc.pipeline(
    hubert_model,
    net_g,
    sid,
    audio,
    input_audio_path,
    times,
    f0_up_key,
    f0_method,
    file_index,
    # file_big_npy,
    index_rate,
    if_f0,
    filter_radius,
    tgt_sr,
    resample_sr,
    rms_mix_rate,
    version,
    f0_file=f0_file,
    protect=protect
)
return audio_opt

def get_vc(model_path): global n_spk, tgt_sr, net_g, vc, cpt, version if model_path == "" or model_path == []: global hubert_model if hubert_model is not None: print("clean_empty_cache") del net_g, n_spk, vc, hubert_model, tgt_sr hubert_model = net_g = n_spk = vc = hubert_model = tgt_sr = None if torch.cuda.is_available(): torch.cuda.empty_cache() if_f0 = cpt.get("f0", 1) version = cpt.get("version", "v1") if version == "v1": if if_f0 == 1: net_g = SynthesizerTrnMs256NSFsid( *cpt["config"], is_half=config.is_half ) else: net_g = SynthesizerTrnMs256NSFsid_nono(*cpt["config"]) #elif version == "v2": # if if_f0 == 1: # net_g = SynthesizerTrnMs768NSFsid( # *cpt["config"], is_half=config.is_half # ) #else: # net_g = SynthesizerTrnMs768NSFsid_nono(*cpt["config"]) del net_g, cpt if torch.cuda.is_available(): torch.cuda.empty_cache() cpt = None return {"visible": False, "type": "update"}

print("loading %s" % model_path)
cpt = torch.load(model_path, map_location="cpu")
tgt_sr = cpt["config"][-1]
cpt["config"][-3] = cpt["weight"]["emb_g.weight"].shape[0]
if_f0 = cpt.get("f0", 1)
version = cpt.get("version", "v1")
if version == "v1":
    if if_f0 == 1:
        net_g = SynthesizerTrnMs256NSFsid(*cpt["config"], is_half=config.is_half)
    else:
        net_g = SynthesizerTrnMs256NSFsid_nono(*cpt["config"])
#elif version == "v2":
 #   if if_f0 == 1:
  #      net_g = SynthesizerTrnMs768NSFsid(*cpt["config"], is_half=config.is_half)
   # else:
    #    net_g = SynthesizerTrnMs768NSFsid_nono(*cpt["config"])
del net_g.enc_q
print(net_g.load_state_dict(cpt["weight"], strict=False))
net_g.eval().to(config.device)
if config.is_half:
    net_g = net_g.half()
else:
    net_g = net_g.float()
vc = VC(tgt_sr, config)
n_spk = cpt["config"][-3]
return {"visible": True, "maximum": n_spk, "__type__": "update"}

get_vc(model_path) wav_opt = vc_single( 0, input_path, f0_up_key, None, f0method, file_index, file_index2, index_rate, filter_radius, resample_sr, rms_mix_rate, protect ) wavfile.write(opt_path, tgt_sr, wav_opt)

Jul 25 '23 12:07 allthingssecurity

@allthingssecurity I've actually started another repo with this file I'm maintaining. I think they updated it. I just recently used this so I know it works on the newest version. Let me know if it works for you.

https://github.com/sethtallen/RBVC_CLI_TOOL

Jul 25 '23 22:07 sethtallen

Thanks but Still doesnt work for me.. Here is the output of the python file I used from your repo. !python myinfer.py 0 "/content/drive/MyDrive/jain/audio/kannada1.mp3" "/content/drive/MyDrive/jain/audio/smj_kannada.wav" "/content/Retrieval-based-Voice-Conversion-WebUI/weights/smj.pth" "" "cuda:0" "pm"

['myinfer.py', '0', '/content/drive/MyDrive/jain/audio/kannada1.mp3', '/content/drive/MyDrive/jain/audio/smj_kannada.wav', '/content/Retrieval-based-Voice-Conversion-WebUI/weights/smj.pth', '', 'cuda:0', 'pm'] 2023-07-26 02:49:10.497965: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-07-26 02:49:11.545609: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2023-07-26 02:49:13 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX loading pth /content/Retrieval-based-Voice-Conversion-WebUI/weights/smj.pth gin_channels: 256 self.spk_embed_dim: 109 <All keys matched successfully> 2023-07-26 02:49:18 | INFO | fairseq.tasks.hubert_pretraining | current directory is /content/Retrieval-based-Voice-Conversion-WebUI 2023-07-26 02:49:18 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False} 2023-07-26 02:49:18 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False} Traceback (most recent call last): File "/content/Retrieval-based-Voice-Conversion-WebUI/myinfer.py", line 230, in vc_single(sid=0,input_audio_path=input_path,f0_up_key=f0_up_key,f0_file=None,f0_method=f0_method,file_index=file_index,file_index2="",index_rate=1,filter_radius=3,resample_sr=0,rms_mix_rate=0,model_path=model_path,output_path=output_path) File "/content/Retrieval-based-Voice-Conversion-WebUI/myinfer.py", line 183, in vc_single audio_opt = vc.pipeline( TypeError: VC.pipeline() got multiple values for argument 'f0_file' Not sure what I am doing wrong

Jul 26 '23 02:07 allthingssecurity

Are you running this from Colab? I've not done that. I've attempted to reproduce, but in order to access the terminal within Colab, I have to have a membership unfortunately. I think this is a Colab unique problem. I've tested the script against the newest version before and didn't have an issue.

Also, I am guessing you replaced the contents of myinfer.py with the infer_cli.py file? You are calling myinfer.py

f0 file should equal 'None', as by default when calling the function, within my script. I do not really understand how its getting multiple values. If you have copied the contents of infer_cli.py to myinfer.py, I can give you some instruction on how we can possibly troubleshoot this.

Jul 26 '23 03:07 sethtallen

I think I fixed issue. Had to map the parameters exactly. Made these changes to make it work audio_opt = vc.pipeline( hubert_model, net_g, sid, audio, #input_audio_path, times, f0_up_key, f0_method, file_index,

    index_rate,
    if_f0,
    #filter_radius,
    #tgt_sr,
    #resample_sr,
    #rms_mix_rate,
    #version,
    f0_file=f0_file,
    #protect=protect
)

Thanks a lot anyways. Now it works in colab as well as other places

Jul 26 '23 03:07 allthingssecurity

@sethtallen Hi, Seth. can you please explain the arguments? [TRANSPOSE_VALUE] "[INPUT_PATH]" "[OUTPUT_PATH]" "[MODEL_PATH]" "[INDEX_FILE_PATH]" "[INFERENCE_DEVICE]" "[METHOD]"

Jul 27 '23 13:07 akkharolia

@akkharolia Transpose = transpose

Input and output is the audio file you're converting.

Model path is in your weights folder. Index file path is the .index file you find in logs/[model_name]/*.index.

Inference device is either CPU or whatever index you have for GPU (cuda:0)

Method is either pm, harvest, or crepe.

Jul 27 '23 16:07 sethtallen

So, there’s no way to train the RVC on the voice and create a model using CLI?

On Thu, 27 Jul 2023 at 21:47, Seth T. Allen @.***> wrote:

@akkharolia https://github.com/akkharolia Transpose = transpose

Input and output is the audio file you're converting.

Model path is in your weights folder. Index file path is the .index file you find in logs/[model_name]/*.index.

Inference device is either CPU or whatever index you have for GPU (cuda:0)

Method is either pm, harvest, or crepe.

— Reply to this email directly, view it on GitHub https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/299#issuecomment-1653935876, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDEBCSSPU4IHWXVKKIN6VLXSKIB3ANCNFSM6AAAAAAYEQHGDQ . You are receiving this because you were mentioned.Message ID: <RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/299/1653935876@ github.com>

Jul 28 '23 02:07 akkharolia

The script I've committed is only for inference, not for training. I do not know if there's currently a script for training a model via CLI, but it would be a worthwhile one to add, I would reckon.

Jul 28 '23 04:07 sethtallen

Thanks for updating @sethtallen. I'll try builiding for a training script.

Jul 28 '23 04:07 akkharolia

Let me know if you would like help, I am willing to assist. My email is [email protected]

Jul 28 '23 04:07 sethtallen

@allthingssecurity I've actually started another repo with this file I'm maintaining. I think they updated it. I just recently used this so I know it works on the newest version. Let me know if it works for you.

https://github.com/sethtallen/RBVC_CLI_TOOL

Page not found :)

Sep 04 '23 16:09 levinh822002

Retrieval-based-Voice-Conversion-WebUI Retrieval-based-Voice-Conversion-WebUI copied to clipboard

myinfer.py has not been updated

Retrieval-based-Voice-Conversion-WebUI
Retrieval-based-Voice-Conversion-WebUI copied to clipboard