DeepSpeaker-pytorch
DeepSpeaker-pytorch copied to clipboard
Numbers of frames
Hi! Why are you using so low numbers of frame as default (32 as i see)? Voxceleb dataset wasn't preprocessing for dropping silence segments. Thus, many parts of training data is only silence. Acc is growing when I use greater number of frames (of course it's not only from silence segments). May be you was doing some experiments with numbers of frames?
how much accuracy did you reached ..??
Yes, I tested model using 32 frames. A little more frame - 36 frames was also tested, but this is similar. I agree that using long frame(about 300 frames or more) makes higher accuracy. It is also something to test. Another approach is that extracting many input from single wave, and use mean of this output vector.
Currently, I am editing the entire framework. Accuracy is depend on above mentioned input size or length normalization(mean of output vector). so I need more experimentation.
The best accuracy is 88% (for 300 frames). Also I was experimented with 32 frames (78%) and 100 frames (84%). Acc is growing up for all models, but i think it's a few percents.
How can i change the frame number for testing?
@Cold-Winter you can look the constant.py the number of frame == NUM_NEXT_FRAME + NUM_PREVIOUS_FRAME