diffwave-sr
diffwave-sr copied to clipboard
Speech Super-resolution with Unconditional Diffwave
Source code of the paper Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution.
Training
- Install python requirements.
pip install requirements.txt
- Please convert all the data files into
.wavformat and put them under the same directory. The following command will train a 48 kHz UDM.
python train.py model.res_channels=64 epochs=50 sr=48000 train_T=0 dataset.size=120000 dataset.segment=32768 dataset.data_dir=/your/vctk/train/set/ loader.batch_size=12 scheduler.patience=1000000
Evaluation
The numbers in the paper can be reproduced with following commands.
rate: the upscaling ratio.downsample-type: the downsampling filter.infer-type: the upscaling method.lr: the $\eta$ value in the paper.
Spline Interpolation
python vctk_dsp_baseline.py /your/vctk/test/set/ --downsample-type sinc --infer-type spline --rate 2
UDM+
python -W ignore vctk_infer.py outputs/XXXX/saved/training_checkpoint_500000.pt outputs/XXXX/.hydra/config.yaml /your/vctk/test/set --rate 2 -T 50 --infer-type manifold --downsample-type stft --lr 0.67
UDM+ without MCG
python -W ignore vctk_infer.py outputs/XXXX/saved/training_checkpoint_500000.pt outputs/XXXX/.hydra/config.yaml /your/vctk/test/set --rate 3 -T 50 --infer-type inpainting --downsample-type sinc
NU-Wave(+)
The checkpoint of UDM is used for noise scheduling.
For training NU-Wave, please refer to here. For evaluating NU-Wave+, change infer-type to nuwave-manifold and specify the value of lr.
python -W ignore vctk_infer.py outputs/XXXX/saved/training_checkpoint_500000.pt outputs/XXXX/.hydra/config.yaml /your/vctk/test/set --nuwave-ckpt /XXXX/checkpoints_nuwave_x2/nuwave_x2_01_07_22_epoch\=645_EMA --rate 2 -T 50 --infer-type nuwave --downsample-type stft
NU-Wave 2(+)
The checkpoint of UDM is used for noise scheduling.
For training NU-Wave 2, please refer to here. For evaluating NU-Wave 2+, change infer-type to nuwave2-manifold and specify the value of lr.
python -W ignore vctk_infer.py outputs/XXXX/saved/training_checkpoint_500000.pt outputs/XXXX/.hydra/config.yaml /your/vctk/test/set --nuwave-ckpt /XXXX/nuwave2_08_14_09_epoch\=72_EMA --rate 3 -T 50 --infer-type nuwave2 --downsample-type sinc
We'll release the script for evaluating WSRGlow and NVSR in the future.
Pre-trained Checkpoints
- 48 kHz
- 16 kHz
Extending to non-zero phase response lowpass filters
When using IIR lowpass filter to downsample audio, it introduces non-linear phase delays, thus breaking the assumption that the frequency mask is real value. An easy solution to compensate for the delays is applying the same filter again during upsampling but in a backward direction of time. We conducted the same 48 kHz experiment in the paper again but with a 8th order Chebyshev Type I lowpass filter.
| 2x | 3x | |
|---|---|---|
| NU-Wave | 0.87 | 1.00 |
| NU-Wave 2 | 0.73 | 0.87 |
| NU-Wave+ | 1.03 | 1.32 |
| NU-Wave 2+ | 0.86 | 1.00 |
| UDM+ | 0.64 | 0.79 |