Matlab and python BSS Eval comparison
I tried evaluating the same set of files on both Python and Matlab implementations of BSS eval and I'm getting different results. Do you know what's the expected error value? The Python results give a low GNSDR but high GSAR and GSIR but the Matlab results are consistent with those reported in papers.
This is unfortunately a difficult question to answer, and the causes for deviation are threaded all throughout the computation stack, from BLAS up to the BSS eval algorithms.
I'm having trouble finding the thread that covers these issues, but here's what I recall from debugging the regression tests:
- bss eval involves a lot of potentially, numerically unstable calculations around matrix inverses. Rather than consistently using a stable method (eg a regularized pseudoinverse), it will first try to invert a matrix, and if that fails, backs off to a pseudo-inverse. It will not tell you when this happens, so diagnosing this behavior is a real pain. See, e.g., here: https://github.com/craffel/mir_eval/blob/master/mir_eval/separation.py#L712-L717
- Whether or not a particular matrix inversion will fail is not a deterministic function of the input values. It depends on the underlying linear algebra system (BLAS), and different platforms will produce different behavior, even for the same mir_eval code. For instance, a test might pass with openblas and fail with atlas, or vice versa.
- The mir_eval code is a pretty direct port from the matlab code, which uses the backslash operator to do matrix inversion-projections. In principle, this behaves the same as the try-catch block above, though it might be more precise to always use a least-squares solver. In principle, this shouldn't be any different from the try-catch idea linked above, but the devil's in the details, and we can't crack matlab open to see how the
\operator really works.
To summarize, I think you don't see divergence in the matlab implementation because it has a common linear algebra stack. However, the fact that these divergences do pop up indicates that the bss eval definitions are not inherently stable (or sufficiently specified), and should always be taken with a grain of salt.
Having looked at a lot of regression test failures that should produce identical outputs, my gut feeling is that anything past the first decimal place is unreliable. But this is anecdotal, not rigorous evidence, so take with a grain of salt.
cc @faroit @aliutkus @ecmjohnson
Aha, reference thread: https://github.com/craffel/mir_eval/issues/239
I think @bmcfee has covered most of the important information above. The regression testing was challenging due to differences caused by the particular installed backend; however, the implementation is entirely parallel. If you want to see a comparison of mir_eval and BSS_EVAL on a couple real tracks you can have a look here. This might allow you to investigate the differences on your particular case as well
but the devil's in the details, and we can't crack matlab open to see how the \ operator really works.
Stack overflow to the rescue: apparently we CAN (sort of) peek inside: https://scicomp.stackexchange.com/questions/1001/how-does-the-matlab-backslash-operator-solve-ax-b-for-square-matrices