auraloss icon indicating copy to clipboard operation
auraloss copied to clipboard

Defaults for Multiscale STFT loss

Open turian opened this issue 1 year ago • 2 comments

        fft_sizes=[1024, 2048, 512],
        hop_sizes=[120, 240, 50],
        win_lengths=[600, 1200, 240],

These are the defaults provided. What sample rate are they intended for?

(Just curious, how did you choose them? But desired sample rate is more important for me.)

turian avatar Sep 12 '22 04:09 turian

This is a good question and likely should be added to the docstring.

These are the values from the paper we based the implementation on https://arxiv.org/abs/1910.11480. Based on the paper they are meant for audio at 24 kHz. I generally do not use these default values in most of my setups which are at a higher sample rate. DDSP opted to use a larger number of window and frame sizes which perhaps mitigates somewhat the variability across sample rates.

csteinmetz1 avatar Sep 13 '22 23:09 csteinmetz1

Yeah. I guess I take a more hardcore mindset here and believe that NO defaults should be provided, and the docstring should give a few examples (with associated SRs) and their cites. The way it is now, it's a bit easy to footgun yourself I think?

turian avatar Sep 13 '22 23:09 turian