VGGVox-PyTorch
VGGVox-PyTorch copied to clipboard
Implementing VGGVox for Speaker Identification on VoxCeleb1 dataset in PyTorch.
VGGVox-PyTorch
Implementing VGGVox for VoxCeleb1 dataset in PyTorch.
Train
pip install -r requirements.txt
python3 train.py --dir ./data/
Specify data dir with --dir
Notes
- 81.79% Top-1 & 93.17 Top-5 Test-set accuracy, pretty satisfactory. Find details in results.txt.
- Training on the V100 takes 4 mins per epoch.
Model
- Run
python3 vggm.py
for model architecture. - Best model weights uploaded VGGM300_BEST_140_81.99.pth
What i've done so far:
- [x] All the data preprocessed exactly as author's matlab code. Checked and verified online on matlab
- [x] Random 3s cropped segments for training.
- [x] Copy all hyperparameter... LR, optimizer params, batch size from the author's net.
- [x] Stabilize PyTorch's BatchNorm and test variants. Improved results by a small percentage.
- [x] Try onesided spectrogram input as mentioned on the author's github.
- [ ] ~~Port the authors network from matlab and train. The matlab model has 1300 outputs dimension, will test it later.~~
- [ ] ~~Copy weights from the matlab network and test.~~
References and Citations:
- VGGVox
- linhdvu14's vggvox-speaker-identification
- jameslyons's python_speech_features
@InProceedings{Nagrani17,
author = "Nagrani, A. and Chung, J.~S. and Zisserman, A.",
title = "VoxCeleb: a large-scale speaker identification dataset",
booktitle = "INTERSPEECH",
year = "2017",
}
@InProceedings{Nagrani17,
author = "Chung, J.~S. and Nagrani, A. and Zisserman, A.",
title = "VoxCeleb2: Deep Speaker Recognition",
booktitle = "INTERSPEECH",
year = "2018",
}