open-unmix-pytorch icon indicating copy to clipboard operation
open-unmix-pytorch copied to clipboard

SDR performance regression since version 1.0.0

Open sevagh opened this issue 3 years ago • 5 comments

Hi all,

So, when I evaluated UMX and X-UMX (to compare with my model, xumx-sliCQ), in my MDX published paper, I reported the following median SDR values:

Our model, xumx-sliCQ, was trained on MUSDB18-HQ. On the test set, xumx-sliCQ achieved a median SDR of 3.6 dB versus the 4.64 dB of UMX and 5.54 dB of X-UMX, https://mdx-workshop.github.io/proceedings/hanssian.pdf

I noticed this wasn't aligning with published performance of UMX/X-UMX (in the X-UMX paper):

Table 1: Comparison of X-UMX with other public methods in terms of SDR (“median of frames, median of tracks”). Avg. UMX [16] 5.41 X-UMX (proposed) 5.79 https://arxiv.org/pdf/2010.04228.pdf

UMX is supposed to be closer to X-UMX, not almost 1 dB SDR apart.

The way I generate my old (seemingly incorrect) values of 4.64 dB for UMX and 5.54 dB for X-UMX was using the standard sigsep tools to loop over MUSDB18-HQ (50 test tracks), store the evaluation json file, and then aggregate using the aggregation + boxplot scripts:

  • https://gitlab.com/sevagh/xumx_slicq_extra/-/blob/main/mss_evaluation/mss-oracle-experiments/oracle_eval/trained_models.py#L152-178

Then I added this script (from the sisec 2018 repo):

  • https://gitlab.com/sevagh/xumx_slicq_extra/-/blob/main/mss_evaluation/mss-oracle-experiments/oracle_eval/compare_json_results.py

The X-UMX evaluation, using the same set of json files, gives me an SDR of 5.91 dB (higher than 5.54) because the compare_json_results.py gives me a different median (across targets + frames) than 5.54.

The UMX evaluation of the old release corresponding to the published paper (https://github.com/sigsep/open-unmix-pytorch/releases/tag/v1.0.0) got much better results, leading to a median SDR of 5.41 dB (exactly as the X-UMX publication).

The UMX evaluation of the latest tip of master gives me the median SDR significantly lower than 5.41 dB. I think that the new code of UMX has a performance regression.

sevagh avatar Apr 21 '22 13:04 sevagh

@sevagh interesting, we have a regression test that did compare the SDR against vs. 1.0, so that at least on that one track it doesn't seem to have changed.

did you make sure that the separator settings are the same in your code?

faroit avatar Apr 21 '22 21:04 faroit

@sevagh did you manage to figure out what the issue is?

faroit avatar May 12 '22 08:05 faroit

Forgot to follow up - I will try to replicate later today.

sevagh avatar May 12 '22 17:05 sevagh

The regression test uses the 7s MUSDB18 excerpt (which may not even be in the wav format). I wonder if that's related.

sevagh avatar May 16 '22 19:05 sevagh

I think it's because you evaluated on HQ as opposed to the normal musdb18.

faroit avatar May 17 '22 11:05 faroit

@sevagh i think we can close this. I had to lower the regression test tolerance in #144 :-/

faroit avatar Apr 16 '24 10:04 faroit