sigsep-mus-eval add sdr metric from MDX challenge

trafficstars

we want to add the simplified SDR metric from the MDX challenge, and we realized that SDR isn't clear. So we are willing to rename the simplified metric:

Please vote:

Jan 31 '23 10:01 faroit

1️⃣ uSDR as coined by @luo42 in https://arxiv.org/abs/2209.15174

Jan 31 '23 10:01 faroit

2️⃣ NSDR as coined by @adefossez in https://github.com/facebookresearch/demucs

Jan 31 '23 10:01 faroit

3️⃣ SDR as used in the Challenge https://www.aicrowd.com/challenges/sound-demixing-challenge-2023

Jan 31 '23 10:01 faroit

I got some feedback that this new metric is really just a SNR. Do you think SNR would be a good name ? Maybe TSNR for Track level SNR to emphasize the computation over the entire track no segments ? Because SDR was scale invariant but the new one is not, it could be a bit misleading to reuse a name so close like nSDR, even if nothing says scale invariant in SDR.

Jan 31 '23 12:01 adefossez

Hi Alexandre, actually only the source-version of the BSS Eval is scale-invariant but the image-version is not (see, e.g., (2.1) on page 25 in https://theses.hal.science/tel-01684685/document).

Jan 31 '23 12:01 StefanUhlich-sony

4️⃣ TSNR track-level SNR as proposed by @adefossez

Jan 31 '23 13:01 faroit

5️⃣ uSNR utterance-level SNR

Jan 31 '23 13:01 faroit

actually lets do a vote on twitter: https://twitter.com/faroit/status/1620414432395558912?s=20

Jan 31 '23 13:01 faroit

To me, this is simply SNR. Which can be gamed by rescaling the mixture... I'm actually worried that this is the right metric, tbh.

Jan 31 '23 20:01 Jonathan-LeRoux

@Jonathan-LeRoux but that is the same problem for the "real" SDR and we usually can't really do scale-invariance for music separation coming from the applications... 🤷‍♂️

We will have a perceptual part in the challenge this time but we need to drop SNR/SDR very soon, I agree.

Feb 01 '23 09:02 faroit

Following up on what I wrote on Twitter: I looked at the MDX paper and it doesn't look like the final metric uses any median averaging, it's all classical averages. In which case I think anybody reading "SNR" would imagine they'd compute the SNR of a whole song over the 2 channels then average that over songs, separately for each instrument (and average again to get the final metric). Regarding the relevance of the metric and the issue with scale invariance: I agree that allowing scale invariance would be odd for music applications. And in most cases, one can hope that the systems are doing a good enough job at removing other sources that there is no significant game to be played by a simple rescaling. But clearly that's not the case for the mixture, and the rescaled mixture is used as a baseline, which I find misleading. One option could be to ensure mixture consistency across the stems, but that could also penalize some methods...

Feb 03 '23 13:02 Jonathan-LeRoux

sigsep-mus-eval sigsep-mus-eval copied to clipboard

add sdr metric from MDX challenge

sigsep-mus-eval
sigsep-mus-eval copied to clipboard