sigsep-mus-eval icon indicating copy to clipboard operation
sigsep-mus-eval copied to clipboard

add sdr metric from MDX challenge

Open faroit opened this issue 2 years ago • 11 comments
trafficstars

we want to add the simplified SDR metric from the MDX challenge, and we realized that SDR isn't clear. So we are willing to rename the simplified metric:

Please vote:

faroit avatar Jan 31 '23 10:01 faroit

1️⃣ uSDR as coined by @luo42 in https://arxiv.org/abs/2209.15174

faroit avatar Jan 31 '23 10:01 faroit

2️⃣ NSDR as coined by @adefossez in https://github.com/facebookresearch/demucs

faroit avatar Jan 31 '23 10:01 faroit

3️⃣ SDR as used in the Challenge https://www.aicrowd.com/challenges/sound-demixing-challenge-2023

faroit avatar Jan 31 '23 10:01 faroit

I got some feedback that this new metric is really just a SNR. Do you think SNR would be a good name ? Maybe TSNR for Track level SNR to emphasize the computation over the entire track no segments ? Because SDR was scale invariant but the new one is not, it could be a bit misleading to reuse a name so close like nSDR, even if nothing says scale invariant in SDR.

adefossez avatar Jan 31 '23 12:01 adefossez

Hi Alexandre, actually only the source-version of the BSS Eval is scale-invariant but the image-version is not (see, e.g., (2.1) on page 25 in https://theses.hal.science/tel-01684685/document).

StefanUhlich-sony avatar Jan 31 '23 12:01 StefanUhlich-sony

4️⃣ TSNR track-level SNR as proposed by @adefossez

faroit avatar Jan 31 '23 13:01 faroit

5️⃣ uSNR utterance-level SNR

faroit avatar Jan 31 '23 13:01 faroit

actually lets do a vote on twitter: https://twitter.com/faroit/status/1620414432395558912?s=20

faroit avatar Jan 31 '23 13:01 faroit

To me, this is simply SNR. Which can be gamed by rescaling the mixture... I'm actually worried that this is the right metric, tbh.

Jonathan-LeRoux avatar Jan 31 '23 20:01 Jonathan-LeRoux

@Jonathan-LeRoux but that is the same problem for the "real" SDR and we usually can't really do scale-invariance for music separation coming from the applications... 🤷‍♂️

We will have a perceptual part in the challenge this time but we need to drop SNR/SDR very soon, I agree.

faroit avatar Feb 01 '23 09:02 faroit

Following up on what I wrote on Twitter: I looked at the MDX paper and it doesn't look like the final metric uses any median averaging, it's all classical averages. In which case I think anybody reading "SNR" would imagine they'd compute the SNR of a whole song over the 2 channels then average that over songs, separately for each instrument (and average again to get the final metric). Regarding the relevance of the metric and the issue with scale invariance: I agree that allowing scale invariance would be odd for music applications. And in most cases, one can hope that the systems are doing a good enough job at removing other sources that there is no significant game to be played by a simple rescaling. But clearly that's not the case for the mixture, and the rescaled mixture is used as a baseline, which I find misleading. One option could be to ensure mixture consistency across the stems, but that could also penalize some methods...

Jonathan-LeRoux avatar Feb 03 '23 13:02 Jonathan-LeRoux