opus icon indicating copy to clipboard operation
opus copied to clipboard

VAD

Open JanX2 opened this issue 6 years ago • 5 comments

After evaluating a few VADs I could find, the one included in this project is far superior to the others. A lot of projects are using the VAD from Google’s WebRTC.

To evaluate your VAD, I built opusenc and logged the results similarly to what’s commented out here: https://github.com/xiph/opus/blob/cdaf661e8d3e85770bf06db8cff12ae6be7fa2a6/src/analysis.c#L938

After reading through the code a couple of times, several questions arose:

  1. Why are all samples resampled to a sample rate of 48 kHz?
  2. Could we work together to decouple the VAD/music detection to a greater extent than it is now?

The latter would be very helpful in using the VAD independently of Opus.

JanX2 avatar Apr 29 '19 18:04 JanX2

  1. CELT internally operates at a sampling rate of 48 kHz. Also, 48 kHz is guaranteed to work for all audio (i.e. it can represent the whole audible spectrum)
  2. I believe Google was at some point using the speech/music detector in chromium and/or webrtc, so it's possible that work's already been done

BTW, did you evaluate the VAD that's in RNNoise? Also, care to share your results (and methodology)?

jmvalin avatar Apr 29 '19 19:04 jmvalin

I just empirically evaluated it for use with the audio I want to use it with: long-form speech recordings occasionally containing music. No statistical evaluation sadly. That’s be bond my expertise.

  1. makes sense.
  2. Interesting! I’ll have to dig into that. Any references you have for me?

I did evaluated RNNoise back when it was new. It didn’t work well with my material. Occasionally had a look, but did not see much movement there. Anything I have missed?

JanX2 avatar Apr 30 '19 17:04 JanX2

@jmvalin I find that the vad used in the opus codes is great. I want to change the frame duration from 20ms to 16ms, but I don't know the principle of tonality_analysis function in the file of <analysis.c>. There are many coefficients, for example, <band_log2[b+1] = .5f1.442695f(float)log(E+1e-10f);>. Could you pls share me some ideas about how to change the codes? or some papers? Thank you in advance.

xinkez avatar Jul 09 '19 11:07 xinkez

@JanX2 Do you have standalone repo of VAD from opus tree ?

alokprasad avatar Oct 07 '19 12:10 alokprasad

@alokprasad No. Just something that’s hacked together to play around with it.

JanX2 avatar Oct 19 '19 16:10 JanX2