aero icon indicating copy to clipboard operation
aero copied to clipboard

Predict on CPU

Open patriotyk opened this issue 1 year ago • 5 comments

As I understand form code there is hardcoded CUDA support. So I have changed device to cpu and replaced model.cuda() with model.cpu() But when I run predict I got strange error:

RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 2, 160]

I don't know is it problem with cpu or something else.

patriotyk avatar Aug 08 '23 14:08 patriotyk

Hi, Thank you for trying our code! This might be because of the dimensions of your input. If I recall correctly, we assume that the wav file the path directs to is a single-channel (mono) wav file. Could it be that your input is stereo instead of mono?

Also, the audio file should not be too short. I think that at least 1 second long. Let me know if this helps, and anything else to help me reproduce the bug myself.

Best, M

m-mandel avatar Aug 08 '23 20:08 m-mandel

Yes, you are right, my input was stereo, thank you. Now it works, but output is much worse than original.

patriotyk avatar Aug 09 '23 06:08 patriotyk

Which ckpt were you using? what are the source and target sample rates?

m-mandel avatar Aug 10 '23 05:08 m-mandel

I use this checkpoint https://drive.google.com/drive/folders/1JK9VqgfQsWEPOFUkp9Y5OR62G9i3disf

Source is 12kHz and output file generated in 16kHz. As I understand I run incorrect command:

python predict.py dset=4-16 experiment=aero_4-16_512_256 

but it should be

python predict.py dset=12-48 experiment=aero_12-48_512_256

but in this repository is only 4-16experiments, and no any 12-48 experiment files. Did you forget to add it? Or I should create them manually?

patriotyk avatar Aug 10 '23 09:08 patriotyk

Yes, you are right - you need to modify the configuration file. If I recall correctly, the only thing you need to change are the sampling rates. From:

lr_sr: 4000 # low resolution sample rate, added to support BWE. Should be included in training cfg
hr_sr: 16000 # high resolution sample rate. Should be included in training cfg

to:

lr_sr: 12000 # low resolution sample rate, added to support BWE. Should be included in training cfg
hr_sr: 48000 # high resolution sample rate. Should be included in training cfg

m-mandel avatar Aug 10 '23 17:08 m-mandel