DeepFilterNet icon indicating copy to clipboard operation
DeepFilterNet copied to clipboard

Multi-resolution loss

Open aaronhsueh0506 opened this issue 3 years ago • 3 comments

Hi Rikorose,

Thanks again for your perfect research. I would like to learn more about the multi-resolution loss, I found that your code assigns the time-domain signal to the multi-resolution loss.

Does this mean the following steps?

  1. First, suppose we have a time domain signal (1,48000)
  2. Convert to (100,481) (using 960 RFFT, gives 100 frames) -> spectral loss?
  3. Then go back to the time domain again. (1,48000)
  4. FFT transforms with different NFFT sizes. , ex. NFFT=1024, get (94,513) -> do multiresolution loss

If so, I have some questions, For stage 2 (DF), we only used 96 bins in deepfilternet, which needs to be changed because NFFT? This seems to be a downsampling case? I mean if I use a 24k sample rate signal, it's the same as changing the NFFT to 48k sample rate.

Thanks, Aaron

aaronhsueh0506 avatar Sep 21 '22 09:09 aaronhsueh0506

Does this mean the following steps?

Yes

For stage 2 (DF), we only used 96 bins in deepfilternet, which needs to be changed because NFFT?

You can change the 96 bins, but you don't need to. The MR loss is mostly targeted towards those lower frequencies and you could of course think about truncating the MR spectrograms also at around 5kHz.

This seems to be a downsampling case? I mean if I use a 24k sample rate signal, it's the same as changing the NFFT to 48k sample rate.

Not sure what you mean here. If you have a 24kHz signal, then the frequencies > 12kHz are just zero and should not affect the loss.

Rikorose avatar Sep 21 '22 09:09 Rikorose

Hi Rikorose,

Thanks for such a quick reply. Well, I think I misunderstood. Because MR loss is processed after mask and DF, there is no need to change bins. If so, I don't understand what effect the different NFFT is trying to bring.

Because I want to try and analyze the relationship between lower sampling rate and computing complexity, but I think this will increase latency and may require retraining the model (I think it's the same thing with multiresolution as first, but I think I'm wrong now.) .

aaronhsueh0506 avatar Sep 21 '22 09:09 aaronhsueh0506

Low NFFT is for high time resolution on the loss side. High NFFT is for a high frequency resolution. Just plot some spectrograms with different FFT sizes and you can see the differences.

Rikorose avatar Sep 21 '22 09:09 Rikorose

Ok, I got it, thanks for your explanation.

aaronhsueh0506 avatar Sep 29 '22 08:09 aaronhsueh0506