speechmetrics icon indicating copy to clipboard operation
speechmetrics copied to clipboard

Inconsistency between museval and speechmetrics-bsseval

Open haoheliu opened this issue 3 years ago • 0 comments

I wrote the following code to compare the behavior between museval and speechmetrics-bsseval.

from museval.metrics import bss_eval
import speechmetrics as sm
import numpy as np

metrics = sm.load(['bsseval'],window=1)

ref = np.random.randn(1, 44100*3, 2)  # [nsrc, nsample, channel], a single audio source with two channels 
est = np.random.randn(1, 44100*3, 2)

res = bss_eval(ref,est,window=44100,hop=44100)

bsseval = metrics(est[0,...],ref[0,...],rate=44100)

print(res)

print(bsseval)

It output the following:

Loaded  speechmetrics.relative.bsseval
(array([[-3.02169448, -2.98148236, -3.01738321]]), array([[-0.03463801, -0.03900151, -0.0400294 ]]), array([[inf, inf, inf]]), array([[-21.09888836, -21.01320054, -21.05034071]]), array([[0]]))
{'sdr': array([[-2.99676764, -2.98088619, -2.99560498],
       [-3.04682562, -2.98208233, -3.03924334]]), 'isr': array([[-0.01493135, -0.01706893, -0.01804728],
       [-0.02410121, -0.02879832, -0.0266307 ]]), 'sar': array([[-21.00349928, -20.94294041, -20.91428113],
       [-21.19664823, -21.08528379, -21.18806085]])}

It seems that speechmetrics treat two channels as two sources.

Change the following code in bsseval.py:16, the problem would be solved.

result = bss_eval(reference_sources=audios[1][None,...], # shape: [nsrc, nsample, nchannels]
                estimated_sources=audios[0][None,...],
                window=self.bss_window * rate,
                hop=self.bss_hop * rate)

haoheliu avatar Feb 24 '21 08:02 haoheliu