opus
opus copied to clipboard
VAD
After evaluating a few VADs I could find, the one included in this project is far superior to the others. A lot of projects are using the VAD from Google’s WebRTC.
To evaluate your VAD, I built opusenc and logged the results similarly to what’s commented out here:
https://github.com/xiph/opus/blob/cdaf661e8d3e85770bf06db8cff12ae6be7fa2a6/src/analysis.c#L938
After reading through the code a couple of times, several questions arose:
- Why are all samples resampled to a sample rate of 48 kHz?
- Could we work together to decouple the VAD/music detection to a greater extent than it is now?
The latter would be very helpful in using the VAD independently of Opus.
- CELT internally operates at a sampling rate of 48 kHz. Also, 48 kHz is guaranteed to work for all audio (i.e. it can represent the whole audible spectrum)
- I believe Google was at some point using the speech/music detector in chromium and/or webrtc, so it's possible that work's already been done
BTW, did you evaluate the VAD that's in RNNoise? Also, care to share your results (and methodology)?
I just empirically evaluated it for use with the audio I want to use it with: long-form speech recordings occasionally containing music. No statistical evaluation sadly. That’s be bond my expertise.
- makes sense.
- Interesting! I’ll have to dig into that. Any references you have for me?
I did evaluated RNNoise back when it was new. It didn’t work well with my material. Occasionally had a look, but did not see much movement there. Anything I have missed?
@jmvalin I find that the vad used in the opus codes is great. I want to change the frame duration from 20ms to 16ms, but I don't know the principle of tonality_analysis function in the file of <analysis.c>. There are many coefficients, for example, <band_log2[b+1] = .5f1.442695f(float)log(E+1e-10f);>. Could you pls share me some ideas about how to change the codes? or some papers? Thank you in advance.
@JanX2 Do you have standalone repo of VAD from opus tree ?
@alokprasad No. Just something that’s hacked together to play around with it.