DeepFilterNet icon indicating copy to clipboard operation
DeepFilterNet copied to clipboard

Deep filter spectrogram normalisation

Open mattpitkin opened this issue 5 months ago • 1 comments

I have noticed that the normalisation of the complex spectrogram features for the deep filtering is not doing what is expected (as described in, say, equation 12 of https://ieeexplore.ieee.org/document/9855850). In the band_unit_norm and band_unit_norm_t functions in lib.rs, the estimates of the mean of the absolute values of the spectrogram (i.e., estimates of the standard deviation of the spectrogram) are square rooted before being used for normalisation, but I don't think the square root should be applied (it's not the variance that is being estimated).

I've tested the spectrograms with and without the square root on the noisy_snr0.wav file. With the square rooting (i.e., the current code), I get:

from df.enhance import init_df, df_features
from df.io import load_audio

model, df_state, _ = init_df()
audio, meta = load_audio("noisy_snr0.wav", 48000, "cpu")
spec, erb_spec, feat_spec = df_features(audio, df_state, 96)

# get standard deviation across time for each frequency
feat_spec.squeeze().std(axis=0)
tensor([[0.1591, 0.0000],
        [0.0860, 0.0697],
        [0.0549, 0.0647],
        [0.0660, 0.0672],
        [0.0794, 0.0768],
        [0.0854, 0.0898],
        [0.0815, 0.0774],
        [0.0863, 0.0810],
        [0.0846, 0.0840],
        [0.0892, 0.0942],
        [0.0704, 0.0788],
        [0.0740, 0.0697],
        [0.0747, 0.0701],
        [0.0859, 0.0967],
        [0.0655, 0.0760],
        [0.0397, 0.0504],
        [0.0350, 0.0424],
        [0.0403, 0.0462],
        [0.0375, 0.0414],
        [0.0395, 0.0428],
        [0.0376, 0.0400],
        [0.0298, 0.0326],
        [0.0342, 0.0453],
        [0.0360, 0.0502],
        [0.0340, 0.0331],
        [0.0309, 0.0305],
        [0.0350, 0.0298],
        [0.0346, 0.0347],
        [0.0491, 0.0447],
        [0.0545, 0.0405],
        [0.0482, 0.0416],
        [0.0415, 0.0346],
        [0.0356, 0.0361],
        [0.0364, 0.0405],
        [0.0563, 0.0501],
        [0.0534, 0.0427],
        [0.0349, 0.0406],
        [0.0388, 0.0317],
        [0.0464, 0.0376],
        [0.0486, 0.0595],
        [0.0445, 0.0623],
        [0.0323, 0.0409],
        [0.0247, 0.0300],
        [0.0292, 0.0252],
        [0.0233, 0.0221],
        [0.0206, 0.0199],
        [0.0185, 0.0222],
        [0.0210, 0.0211],
        [0.0309, 0.0321],
        [0.0375, 0.0307],
        [0.0444, 0.0312],
        [0.0289, 0.0272],
        [0.0225, 0.0227],
        [0.0256, 0.0212],
        [0.0225, 0.0269],
        [0.0280, 0.0318],
        [0.0316, 0.0343],
        [0.0338, 0.0290],
        [0.0303, 0.0295],
        [0.0374, 0.0292],
        [0.0383, 0.0349],
        [0.0401, 0.0392],
        [0.0335, 0.0376],
        [0.0316, 0.0292],
        [0.0307, 0.0233],
        [0.0291, 0.0248],
        [0.0237, 0.0300],
        [0.0252, 0.0314],
        [0.0260, 0.0277],
        [0.0226, 0.0278],
        [0.0227, 0.0247],
        [0.0251, 0.0227],
        [0.0215, 0.0205],
        [0.0223, 0.0245],
        [0.0310, 0.0316],
        [0.0284, 0.0305],
        [0.0239, 0.0294],
        [0.0239, 0.0260],
        [0.0267, 0.0259],
        [0.0271, 0.0277],
        [0.0210, 0.0260],
        [0.0227, 0.0246],
        [0.0242, 0.0249],
        [0.0263, 0.0229],
        [0.0307, 0.0238],
        [0.0282, 0.0249],
        [0.0289, 0.0234],
        [0.0228, 0.0255],
        [0.0263, 0.0234],
        [0.0289, 0.0242],
        [0.0307, 0.0311],
        [0.0319, 0.0295],
        [0.0258, 0.0279],
        [0.0294, 0.0235],
        [0.0267, 0.0230],
        [0.0271, 0.0256]])

The spectrogram is not really unit normalised.

Whereas, if I "fix" the normalisations by removing the square root, and repeat the same thing, I get:

feat_spec.squeeze().std(axis=0)
tensor([[2.3296, 0.0000],
        [1.3393, 1.1812],
        [0.9863, 1.2168],
        [1.1237, 1.1676],
        [1.2367, 1.1289],
        [1.1755, 1.2254],
        [1.1668, 1.1551],
        [1.3028, 1.2769],
        [1.4278, 1.4436],
        [1.5003, 1.5904],
        [1.3261, 1.4367],
        [1.3822, 1.3554],
        [1.4817, 1.3798],
        [1.7359, 1.8143],
        [1.5310, 1.6814],
        [1.1912, 1.4886],
        [1.1763, 1.4043],
        [1.3484, 1.4549],
        [1.2866, 1.3960],
        [1.3701, 1.5043],
        [1.3465, 1.4387],
        [1.1227, 1.2723],
        [1.1466, 1.4495],
        [1.2039, 1.6014],
        [1.3519, 1.2652],
        [1.2835, 1.2271],
        [1.3928, 1.1938],
        [1.3658, 1.3786],
        [1.7323, 1.4352],
        [1.8235, 1.3663],
        [1.6639, 1.4657],
        [1.5581, 1.3309],
        [1.3403, 1.4950],
        [1.4371, 1.6201],
        [1.9739, 1.7786],
        [1.9526, 1.6100],
        [1.4855, 1.5853],
        [1.5229, 1.3026],
        [1.6172, 1.4281],
        [1.6101, 1.8892],
        [1.5990, 2.1108],
        [1.4510, 1.7498],
        [1.2546, 1.3937],
        [1.3603, 1.2353],
        [1.1743, 1.1317],
        [1.1172, 1.1020],
        [1.0012, 1.2032],
        [1.1398, 1.1405],
        [1.3787, 1.4434],
        [1.5753, 1.2559],
        [1.7045, 1.3374],
        [1.3102, 1.2665],
        [1.1642, 1.1396],
        [1.3129, 1.0739],
        [1.1107, 1.3278],
        [1.2442, 1.4601],
        [1.4026, 1.4837],
        [1.5103, 1.3081],
        [1.3626, 1.2847],
        [1.5241, 1.1882],
        [1.5153, 1.3750],
        [1.5315, 1.4012],
        [1.3444, 1.4306],
        [1.3527, 1.2662],
        [1.4215, 1.0987],
        [1.3362, 1.2002],
        [1.0996, 1.3726],
        [1.1805, 1.3284],
        [1.2345, 1.2658],
        [1.1338, 1.4018],
        [1.2715, 1.3374],
        [1.4306, 1.2368],
        [1.2349, 1.1801],
        [1.2234, 1.3517],
        [1.4583, 1.5467],
        [1.4278, 1.4832],
        [1.2620, 1.5466],
        [1.2846, 1.4759],
        [1.4174, 1.3683],
        [1.4542, 1.5140],
        [1.2170, 1.4376],
        [1.2494, 1.3206],
        [1.3139, 1.3805],
        [1.4034, 1.3058],
        [1.5339, 1.3417],
        [1.4687, 1.3555],
        [1.5251, 1.2322],
        [1.2840, 1.3854],
        [1.4100, 1.2823],
        [1.4906, 1.2987],
        [1.4625, 1.5008],
        [1.5403, 1.4719],
        [1.3185, 1.5270],
        [1.4710, 1.2591],
        [1.4023, 1.2440],
        [1.3820, 1.3304]])

and the number are close to 1.

In practice, I've trained a model with the "fix" and it doesn't seem to make a noticeable difference (although admittedly I've not done a throughout set of tests).

mattpitkin avatar Feb 23 '24 12:02 mattpitkin