DeepSpeaker-pytorch Numbers of frames

Numbers of frames

Open Ruslanmlnkv opened this issue 8 years ago • 6 comments

trafficstars

Hi! Why are you using so low numbers of frame as default (32 as i see)? Voxceleb dataset wasn't preprocessing for dropping silence segments. Thus, many parts of training data is only silence. Acc is growing when I use greater number of frames (of course it's not only from silence segments). May be you was doing some experiments with numbers of frames?

Oct 25 '17 10:10 Ruslanmlnkv

how much accuracy did you reached ..??

Oct 26 '17 07:10 mshenron

Yes, I tested model using 32 frames. A little more frame - 36 frames was also tested, but this is similar. I agree that using long frame(about 300 frames or more) makes higher accuracy. It is also something to test. Another approach is that extracting many input from single wave, and use mean of this output vector.

Oct 26 '17 13:10 qqueing

Currently, I am editing the entire framework. Accuracy is depend on above mentioned input size or length normalization(mean of output vector). so I need more experimentation.

Oct 26 '17 13:10 qqueing

The best accuracy is 88% (for 300 frames). Also I was experimented with 32 frames (78%) and 100 frames (84%). Acc is growing up for all models, but i think it's a few percents.

Oct 26 '17 14:10 Ruslanmlnkv

How can i change the frame number for testing?

Jun 18 '18 01:06 Cold-Winter

@Cold-Winter you can look the constant.py the number of frame == NUM_NEXT_FRAME + NUM_PREVIOUS_FRAME

Dec 10 '18 06:12 Nisoka

DeepSpeaker-pytorch DeepSpeaker-pytorch copied to clipboard

Numbers of frames

DeepSpeaker-pytorch
DeepSpeaker-pytorch copied to clipboard