SincNet
SincNet copied to clipboard
Shift vs overlap
Just to clarify, in the paper the overlap is said to be 10ms, but in the code the shift is said to be 10ms. Does that mean between 2 consecutive frames, both beginning and end are only moved by 10ms, so there's 190 ms overlap?
Btw, as a side note, I found that for more difficult tasks(in my case I'm classifying the output of a mixed speech separator), SincNet is better trained when given, instead of with one random splice per example per iteration, non-overlapping splices that cover the whole signal. In my case the accuracy boosted from 83% to 91%.
Hi Joseph, thank you for sharing your experience. For the SpeechBrain project ( https://speechbrain.github.io/) and for the PASE one ( https://github.com/santi-pdp/pase) we didn't perform signal chunking directly. We just use convolution with stride factors to simulate the sliding windows. In practice, to have a feature vector every 10 ms (160 samples), we use stride factors like e.g, 2 x 2 x 2x 4 x 5 over the various convolutional layers that follow the sinc_conv one. This gives the same performance using a simpler pipeline.
On Tue, 5 May 2020 at 05:18, Joseph [email protected] wrote:
Btw, as a side note, I found that for more difficult tasks(in my case I'm classifying the output of a mixed speech separator), SincNet is better trained when given, instead of with one random splice per example per iteration, non-overlapping splices that cover the whole signal. In my case the accuracy boosted from 83% to 91%.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mravanelli/SincNet/issues/87#issuecomment-623947107, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEA2ZVRPCY6CRTIPUFFNDC3RP7KU3ANCNFSM4MZKFR6A .