voxceleb_enrichment_age_gender icon indicating copy to clipboard operation
voxceleb_enrichment_age_gender copied to clipboard

input dimension

Open zhangshaohu opened this issue 3 years ago • 2 comments

Hello!

ASVTorch generates 24 MFCCs, so the MFCCS are (n, 24) shape. Your input is (200, 30). Where is the 30 from? Can you please provide some test samples?

zhangshaohu avatar May 02 '22 04:05 zhangshaohu

Hi! The 30 comes from the number of Mel bins and ceps specified in the MFCC.conf file used by Kaldi https://gitlab.com/ville.vestman/asvtorch/-/blob/master/asvtorch/recipes/voxceleb/xvector/configs/mfcc.conf.

Regarding the test sample ,unfortunately the answer is no. The reason why is that the original dataset comes from YT videos and there are various copyright issues that may arise (also, the original VoxCeleb team should be, imo, the one to provide the raw tracks and devise appropriate sharing rules in their licence). We have, however, provided the list of recordings we used for train and test, therefore it should be possible to replicate it by following all the steps described in the paper and in the various notebooks

hechmik avatar May 02 '22 04:05 hechmik

Thank you for your immediate response. I experienced some errors using ASVtorch so I used Kaldi. The original Kaldi for vector was num-ceps=24 https://github.com/kaldi-asr/kaldi/blob/master/egs/voxceleb/v1/conf/mfcc.conf I will update the value of num-ceps. Yes, Vox data can be requested. I think it is okay if you put several computed features for testing. In this case, somebody would like to replicate your code who only use a simple test example.

zhangshaohu avatar May 02 '22 04:05 zhangshaohu