DeepSpeaker-pytorch icon indicating copy to clipboard operation
DeepSpeaker-pytorch copied to clipboard

Numbers of frames

Open Ruslanmlnkv opened this issue 7 years ago • 6 comments

Hi! Why are you using so low numbers of frame as default (32 as i see)? Voxceleb dataset wasn't preprocessing for dropping silence segments. Thus, many parts of training data is only silence. Acc is growing when I use greater number of frames (of course it's not only from silence segments). May be you was doing some experiments with numbers of frames?

Ruslanmlnkv avatar Oct 25 '17 10:10 Ruslanmlnkv

how much accuracy did you reached ..??

mshenron avatar Oct 26 '17 07:10 mshenron

Yes, I tested model using 32 frames. A little more frame - 36 frames was also tested, but this is similar. I agree that using long frame(about 300 frames or more) makes higher accuracy. It is also something to test. Another approach is that extracting many input from single wave, and use mean of this output vector.

qqueing avatar Oct 26 '17 13:10 qqueing

Currently, I am editing the entire framework. Accuracy is depend on above mentioned input size or length normalization(mean of output vector). so I need more experimentation.

qqueing avatar Oct 26 '17 13:10 qqueing

The best accuracy is 88% (for 300 frames). Also I was experimented with 32 frames (78%) and 100 frames (84%). Acc is growing up for all models, but i think it's a few percents.

Ruslanmlnkv avatar Oct 26 '17 14:10 Ruslanmlnkv

How can i change the frame number for testing?

Cold-Winter avatar Jun 18 '18 01:06 Cold-Winter

@Cold-Winter you can look the constant.py the number of frame == NUM_NEXT_FRAME + NUM_PREVIOUS_FRAME

Nisoka avatar Dec 10 '18 06:12 Nisoka