DeepAFx-ST
DeepAFx-ST copied to clipboard
[Improvement] Increased sample rate to 44100 and added the ability to process entire files.
I managed to improve DeepAFx-ST. Here's what I did.
Download the zip from https://github.com/adobe-research/DeepAFx-ST and extract it.
Open Notepad++, press CTRL+SHIFT+F, find 24000, replace 44100, set the directory, replace in files.
At this point you can safely add the checkpoints and examples.
Edit scripts/process.py
Replace x_44100 = torch.tensor(resampy.resample(x.view(-1).numpy(), x_sr, 44100))
with x_44100 = torch.tensor(resampy.resample(x.reshape(-1).numpy(), x_sr, 44100))
Under x_44100 = x_44100.view(1, -1)
insert x_44100 = x_44100[0:1, : x_44100.shape[-1] // 2]
Under x_44100 = x
insert x_44100 = x_44100[0:1, : x_44100.shape[-1]]
Replace r_44100 = torch.tensor(resampy.resample(r.view(-1).numpy(), r_sr, 44100))
with r_44100 = torch.tensor(resampy.resample(r.reshape(-1).numpy(), r_sr, 44100))
Under r_44100 = r_44100.view(1, -1)
insert r_44100 = r_44100[0:1, : r_44100.shape[-1] // 2]
Under r_44100 = r
insert r_44100 = r_44100[0:1, : r_44100.shape[-1]]
Remove x_44100 = x_44100[0:1, : 44100 * 5]
Remove r_44100 = r_44100[0:1, : 44100 * 5]
Replace filename = os.path.basename(args.input).replace(".wav", "")
with filename = os.path.splitext(os.path.basename(args.input))[0]
Remove reference = os.path.basename(args.reference).replace(".wav", "")
Replace out_filepath = os.path.join(dirname, f"{filename}_out_ref={reference}.wav")
with out_filepath = os.path.join(dirname, f"{filename}_DeepAFx-ST.wav")
Remove in_filepath = os.path.join(dirname, f"{filename}_in.wav")
Remove torchaudio.save(in_filepath, x_44100.cpu().view(1, -1), 44100)
You should be good to go!
It's possible that this approach may have broken some things not related to processing.
#24 #22
Haw can I get the parameters of EQ and Compressor?
Haw can I get the parameters of EQ and Compressor?
Please use the other issue you made for discussion. Your comment does not fit here.
If your results are getting cut in half or doubled, try experimenting with removing or adding // 2
from both lines.
Seems like there are still a lot of issues with this approach. :/
LibriTTS dataset is only at 24 kHz so you would need to find a new dataset to re-train at 44k