auraloss
auraloss copied to clipboard
Defaults for Multiscale STFT loss
fft_sizes=[1024, 2048, 512],
hop_sizes=[120, 240, 50],
win_lengths=[600, 1200, 240],
These are the defaults provided. What sample rate are they intended for?
(Just curious, how did you choose them? But desired sample rate is more important for me.)
This is a good question and likely should be added to the docstring.
These are the values from the paper we based the implementation on https://arxiv.org/abs/1910.11480. Based on the paper they are meant for audio at 24 kHz. I generally do not use these default values in most of my setups which are at a higher sample rate. DDSP opted to use a larger number of window and frame sizes which perhaps mitigates somewhat the variability across sample rates.
Yeah. I guess I take a more hardcore mindset here and believe that NO defaults should be provided, and the docstring should give a few examples (with associated SRs) and their cites. The way it is now, it's a bit easy to footgun yourself I think?