rnnoise icon indicating copy to clipboard operation
rnnoise copied to clipboard

Understanding the per-frequency gain applied per band

Open sevagh opened this issue 4 years ago • 4 comments

Hello, I'm having a hard time understanding the function interp_band_gain.

From the paper it says that the gain applied to the FFT at each frequency bin is the sum of all the amplitudes of the bands to which that frequency belongs.

In code it looks like:

  if (!silence) {
    compute_rnn(&st->rnn, g, &vad_prob, features);
    pitch_filter(X, P, Ex, Ep, Exp, g);
    for (i=0;i<NB_BANDS;i++) {
      float alpha = .6f;
      g[i] = MAX16(g[i], alpha*st->lastg[i]);
      st->lastg[i] = g[i];
    }
    interp_band_gain(gf, g);
#if 1
    for (i=0;i<FREQ_SIZE;i++) {
      X[i].r *= gf[i];
      X[i].i *= gf[i];
    }
#endif

The code for interp_band_gain is:

void interp_band_gain(float *g, const float *bandE) {
  int i;
  memset(g, 0, FREQ_SIZE);
  for (i=0;i<NB_BANDS-1;i++)
  {
    int j;
    int band_size;
    band_size = (eband5ms[i+1]-eband5ms[i])<<FRAME_SIZE_SHIFT;
    for (j=0;j<band_size;j++) {
      float frac = (float)j/band_size;
      g[(eband5ms[i]<<FRAME_SIZE_SHIFT) + j] = (1-frac)*bandE[i] + frac*bandE[i+1];
    }
  }
}

To my knowledge the Bark frequency/critical bands are not overlapping. So how can any 1 frequency belong to more than 1 band?

sevagh avatar Nov 29 '20 15:11 sevagh

Why would I not just do (pseudocode):

band_gains = float[24];

for (j = 0; j < nfft; ++j)
    float frequency_bin = j * sample_rate/nfft;
    if (band_0_left < frequency_bin < band_0_right)
        fft[j] *= band_0_gain;
    else if (band_1_left < frequency_bin < band_1_right)
        fft[j] *= band_1_gain;
    ...

sevagh avatar Nov 29 '20 15:11 sevagh

Where do these magic values come from?

static const opus_int16 eband5ms[] = {
/*0  200 400 600 800  1k 1.2 1.4 1.6  2k 2.4 2.8 3.2  4k 4.8 5.6 6.8  8k 9.6 12k 15.6 20k*/
  0,  1,  2,  3,  4,  5,  6,  7,  8, 10, 12, 14, 16, 20, 24, 28, 34, 40, 48, 60, 78, 100
};

This looks inherited from Opus codebases. Looks like some transform of Bark band frequency edges to DFT indices?

How can I create my own.

sevagh avatar Nov 30 '20 15:11 sevagh

For all I know fromm the paper, the band split is inherited from Opus codec, and it is just a approximation of the Bark scale.

guishengzhang avatar Jan 15 '21 03:01 guishengzhang

"Rather than rectangular bands, we use triangular bands, with the peak response being at the boundary between bands. "

Adjacent bands should be overlapped.

guishengzhang avatar Jan 15 '21 03:01 guishengzhang