PyTorch_Speaker_Verification
PyTorch_Speaker_Verification copied to clipboard
Shuffling wav files in dataloader does not ensure that all the training files are checked in each epoch
As a results the model is trained on N*M
utterances per epoch and not the whole training set. This affects the convergence as well as possible extensions of the code (e.g. early stopping).
where:
N=number of speakers per batch
,
M=number of utterances per speaker per batch
according to the referenced paper.
https://github.com/HarryVolek/PyTorch_Speaker_Verification/blob/10e159a8d3255503c0184cde4eb7097968857a31/data_load.py#L39-L40
For TIMIT dataset, where M=9
(I think) the dataloader may be ok. The issue appears in large datasets such as VoxCeleb1 or VoxCeleb2 where M>50
.
@HarryVolek Can you check this please ? If that is the case I will pr