aero
aero copied to clipboard
Predict on CPU
As I understand form code there is hardcoded CUDA support. So I have changed device to cpu
and replaced model.cuda()
with model.cpu()
But when I run predict I got strange error:
RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 2, 160]
I don't know is it problem with cpu or something else.
Hi, Thank you for trying our code! This might be because of the dimensions of your input. If I recall correctly, we assume that the wav file the path directs to is a single-channel (mono) wav file. Could it be that your input is stereo instead of mono?
Also, the audio file should not be too short. I think that at least 1 second long. Let me know if this helps, and anything else to help me reproduce the bug myself.
Best, M
Yes, you are right, my input was stereo, thank you. Now it works, but output is much worse than original.
Which ckpt were you using? what are the source and target sample rates?
I use this checkpoint https://drive.google.com/drive/folders/1JK9VqgfQsWEPOFUkp9Y5OR62G9i3disf
Source is 12kHz and output file generated in 16kHz. As I understand I run incorrect command:
python predict.py dset=4-16 experiment=aero_4-16_512_256
but it should be
python predict.py dset=12-48 experiment=aero_12-48_512_256
but in this repository is only 4-16experiments, and no any 12-48 experiment files. Did you forget to add it? Or I should create them manually?
Yes, you are right - you need to modify the configuration file. If I recall correctly, the only thing you need to change are the sampling rates. From:
lr_sr: 4000 # low resolution sample rate, added to support BWE. Should be included in training cfg
hr_sr: 16000 # high resolution sample rate. Should be included in training cfg
to:
lr_sr: 12000 # low resolution sample rate, added to support BWE. Should be included in training cfg
hr_sr: 48000 # high resolution sample rate. Should be included in training cfg