auraloss Enhancement ? New metric for source separation, measuring separately bleed and fullness in separated audio

Hi,

I've found a simple way to objectively measure bleed and fullness in context of music source separation that I think could be useful as I haven't seen any existing objective metric doing this, while it's a common question from users.

Here is code as a metric:

def bleed_full(ref, est, sr=44100):
    # STFT parameters
    n_fft = 4096
    hop_length = 1024
    n_mels = 512

    # Compute Mag STFTs
    D1 = np.abs(librosa.stft(ref, n_fft=n_fft, hop_length=hop_length))
    D2 = np.abs(librosa.stft(est, n_fft=n_fft, hop_length=hop_length))

    # Convert to mel spectrograms
    mel_basis = librosa.filters.mel(sr=sr, n_fft=n_fft, n_mels=n_mels)
    S1_mel = np.dot(mel_basis, D1)
    S2_mel = np.dot(mel_basis, D2)

    # Convert to decibels
    S1_db = librosa.amplitude_to_db(S1_mel)
    S2_db = librosa.amplitude_to_db(S2_mel)
    
    # Calculate difference
    diff = S2_db - S1_db

    # Separate positive and negative differences
    positive_diff = diff[diff > 0]
    negative_diff = diff[diff < 0]

    # Calculate averages
    average_positive = np.mean(positive_diff) if len(positive_diff) > 0 else 0
    average_negative = np.mean(negative_diff) if len(negative_diff) > 0 else 0
    
    # Scale with 100 as best score
    bleedness = 100  / (average_positive + 1)
    fullness = 100 / (-average_negative + 1)

    return bleedness, fullness

I guess it can be adapted as losses, but I'm not dev/scientist and I'm lacking knowledge to make it bulletproof, if it worth it, you should know better than me.

Same concept can be used to draw spectrograms with, for example: bleed/positive values (red), missing content/negative values (blue), perfect separation = 0 (white):

Oct 28 '24 22:10 jarredou

@jarredou I'm curious about this. So basically:

Instead of doing l1 mel spectral distance, you separate it into two components:

Bleed = anything ADDED to the target spectrogram
-Fullness = anything REMOVED from the target spectrogram

I see you do MSS work. I noted in the BS-Roformer paper that the authors wrote: "our model outputs gained more preference from musicians and educators than from music producers in the listening test of SDX23". To my ears, bs-roformers seem to have have less bleed but less fullness. I'd be curious if you have any numbers to share. (cc @ZFTurbo )

Nov 09 '24 17:11 turian

@turian Yeah, that's the simple idea behind the 2 metrics.

About the BS-Rofomer quote, it's from this final paper from SDX/MDX23 contest https://arxiv.org/pdf/2308.06979

We don't have numbers between different neural network models. For now, the metrics was only used to evaluate different fine-tuned versions made on top of Kimberley's Melband-Rofomer model the results are accessible here https://docs.google.com/spreadsheets/d/1pPEJpu4tZjTkjPh_F5YjtIyHq8v0SxLnBydfUBUNlbI/edit and it was made using mvsep.com multisong eval dataset.

ZFTurbo has added the torch version of the metric to his training script a few days ago.

Nov 18 '24 23:11 jarredou

Little update: The metric was used as loss, to emphasized fullness on a vocals model and it does great job in the said task, especially on extracting the reverb more fully (also more clarity in the vocals consonants in high frequency range) : 1st pic is Kim's original model, 2nd one is the finetuned version emphasizing on vocals fullness (at a cost of a bit more noisy separation too): kim fullness unwa bigbeta5e fullness

(all these experiments are done inside the Audio Separation discord community (invite: https://discord.gg/ndC4UmPZwZ)

Nov 24 '24 00:11 jarredou

@jarredou Hi! I'm interested in this metric. Did you use it directly and barely as loss using the definition you mentioned? When you mention 'emphasize,' does that imply it should be combined with other traditional losses like waveform loss or STFT loss?

May 30 '25 09:05 happyTonakai

Can we just use STFT magnitude loss with different weights when predict > target and predict < target?

Setting larger weight when predict > target means emphasizing bleedless, and vice versa.

May 30 '25 10:05 happyTonakai