David Nicholson
David Nicholson
This issue needs more specifics about where we are and aren't doing this -- I think I might have addressed it mostly? Will come back and check when we get...
Original algorithm (according to Fukuzawa): Rabiner and Sambur 1975 -- thresholds amplitude envelope, fine tunes using zero crossings. Attached, Matlab implementation from https://www.mathworks.com/matlabcentral/fileexchange/70424-voice-activity-detection-vad-rabiner1975 [upload.zip](https://github.com/vocalpy/vocalpy/files/9301398/upload.zip)
Note that @dgmets said in discussion with @sthaar and @yangzheng-121 that a similar approach that worked well was: + take sliding window of amplitude + find first peak of autocorr...
The next algorithm after Rabiner + Sambur according to Fukuzawa is Harma https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.217.9793&rep=rep1&type=pdf A method tailored to birdsong that models syllables as sinusoidal Matlab implementation here: https://www.mathworks.com/matlabcentral/fileexchange/29261-harma-syllable-segmentation Python implementation by...
Talking with @nickjourjine who mentioned they are using the segmentation functions built into AVA. I'm guessing that it's similar to what's in `evfuncs` but it's always helpful to see multiple...
Thank you @nickjourjine Looks like they segment the spectrogram, i.e. summing spectral power in each time bin then thresholding that. Interesting that they use multiple thresholds to find local maxima...
Good idea, thank you @nickjourjine I should have thought to look at `scikit-maad`. They have refs to papers in their docstrings, which is helpful for trying to figure out what...
Adding one more: @sthaar asked when we met last what @theresekoch is using to segment zebra finch song before training `tweetynet` (@theresekoch apologies in advance for dragging you into a...
> Does that help? Yes, very much so, thank you for letting us know @theresekoch To make sure I understand: * You segment with the algorithm in evsonganaly, which is...
Very helpful to know, thank you. > my understanding is that RMSE is conceptually exactly the same as amplitude thresholding Not sure how much it matters but evsonganaly uses audio...