autovc
autovc copied to clipboard
How to use this for repo for just testing?
I just want to play with this repo, I don't want to train/build anything. Just use it for few times. Any instructions how to do it?
Just do this like readme said~
0.Convert Mel-Spectrograms Download pre-trained AUTOVC model, and run the conversion.ipynb in the same directory.
1.Mel-Spectrograms to waveform Download pre-trained WaveNet Vocoder model, and run the vocoder.ipynb in the same the directory.
Please note the training metadata and testing metadata have different formats.
And how make inference after that?
The import thing is to get "metadata.pkl" it can be get by run make_spect.py -> python make_metadata.py if you directly run them, then use author's wavs if change wavs to ourselves, metadata.pkl is our's wavs, and then read code conversion.ipynb and run it~
python make_metadata.py does NOT generate "metadata.pkl". You can check the code.
@ruclion I have the same problem make_metadata.py does NOT generate "metadata.pkl".
@ruclion I have the same problem make_metadata.py does NOT generate "metadata.pkl".
python make_metadata.py does NOT generate "metadata.pkl". You can check the code.
Hello, I met the same question as you. So could you please share how you solve the question? Thank you in advance!
I just want to play with this repo, I don't want to train/build anything. Just use it for few times. Any instructions how to do it?
Have you solved the question? Could you share the solution, please? Thank you.
I just want to play with this repo, I don't want to train/build anything. Just use it for few times. Any instructions how to do it?
Have you solved the question? Could you share the solution, please? Thank you.
so do i,do you have any solution?
If you put only one wav file into each speaker directory, this modified make_metadata.py should work:
import pickle
from model_bl import D_VECTOR
from collections import OrderedDict
import numpy as np
import torch
C = D_VECTOR(dim_input=80, dim_cell=768, dim_emb=256).eval().cuda()
c_checkpoint = torch.load('3000000-BL.ckpt')
new_state_dict = OrderedDict()
for key, val in c_checkpoint['model_b'].items():
new_key = key[7:]
new_state_dict[new_key] = val
C.load_state_dict(new_state_dict)
num_uttrs = 1
len_crop = 128
# Directory containing mel-spectrograms
rootDir = './spmel'
dirName, subdirList, _ = next(os.walk(rootDir))
print('Found directory: %s' % dirName)
speakers = []
for speaker in sorted(subdirList):
if len(speaker) != 4:
continue
print('Processing speaker: %s' % speaker)
utterances = []
utterances.append(speaker)
_, _, fileList = next(os.walk(os.path.join(dirName,speaker)))
idx_uttrs = np.random.choice(len(fileList), size=num_uttrs, replace=False)
embs = []
mel_specs = []
for i in range(num_uttrs):
tmp = np.load(os.path.join(dirName, speaker, fileList[idx_uttrs[i]]))
candidates = np.delete(np.arange(len(fileList)), idx_uttrs)
melsp = torch.from_numpy(tmp).cuda().unsqueeze(0)
emb = C(melsp)
embs.append(emb.detach().squeeze().cpu().numpy())
mel_specs.append(melsp.squeeze(0))
utterances.append(np.mean(embs, axis=0)) #this is spker embedding
for mel_spec in mel_specs:
utterances.append(mel_spec.cpu().numpy())
speakers.append(utterances)
print("len of speaker", len(speakers))
with open(os.path.join('metadata_own.pkl'), 'wb') as handle:
pickle.dump(speakers, handle)
how to use my own source content wav
and target style wav
? thank you.
@dragen1860 have you fixed the issue?