pase icon indicating copy to clipboard operation
pase copied to clipboard

Training PASE architecture for only Speaker ID using Librispeech data

Open hdubey opened this issue 5 years ago • 4 comments

Hi Mirco, Santi, Thanks again for this great contributions. I had a look at codes and paper. The architecture is interesting. I want to train this architecture on Librispeech for speaker ID in same say as SincNet is trained. What will be the best way to do it. Assume I have all training and test data prepared as per the protocols of SincNet paper. I want to extract supervised Bottleneck features after it is trained to see how overall FER compares with original SincNet.

hdubey avatar Apr 20 '19 07:04 hdubey

Hi @hdubey ,

Do you mean the mutual information training with SincNet (https://arxiv.org/pdf/1812.00271.pdf) or the purely supervised training? I have just uploaded a config file cfg/SincNet_worker.cfg that incorporates the training mechanism of SincNet as MI-only in case you refer to the unsupervised mutual information training. The way to train it would be by specifying the flag --net_cfg in the train.py script to point to the new config file I mention. If you mean the supervised training part, then have a look at spk_id/nnet.py, where you have to specify the PASE config --fe_cfg ../cfg/PASE.cfg without a pretrained ckpt (nothing in --fe_ckpt) and it will attach the selected classifier --model mlp on top of the front-end. In this case the way to specify the training/validation/test partitions is pretty standard, you handle the --train_guia with filepath pointers, the --test_guia too, and validation will be selected as a randomly sampled subset of --train_guia files (controlled with the ratio parameter --va_split that defaults to 20%).

Hope this helps, Santi

santi-pdp avatar Apr 20 '19 08:04 santi-pdp

Hi Santi, Thanks for suggesting this. I just got the unsupervised MI training started. However, I am more interested in Supervised Speaker ID on Librispeech. When I do python spk_id/nnet.py I get following error "ImportError: No module named 'waveminionet' ".

It is not clear how many arguments are needed to run the supervised one. I want to try RNN classifier after front-end what will be the command in that case. Thanks!

hdubey avatar Apr 22 '19 06:04 hdubey

Hi @santi-pdp I fixed the waveminionet issue. However, there seems to be a mandatory data "SPK2iDX", how to generate it for Librispeech? In below command, how can I get the best parameter set for Librispeech for re-producing the supervised PASE speaker ID results that outperformed the SincNets? Thanks!

nnet_copy.py [-h] [--fe_cfg FE_CFG] [--save_path SAVE_PATH] [--data_root DATA_ROOT] [--batch_size BATCH_SIZE] [--train_guia TRAIN_GUIA] [--test_guia TEST_GUIA] [--spk2idx SPK2IDX] [--log_freq LOG_FREQ] [--epoch EPOCH] [--patience PATIENCE] [--seed SEED] [--no-cuda] [--no-rnn] [--ft_fe] [--z_bnorm] [--va_split VA_SPLIT] [--lr LR] [--momentum MOMENTUM] [--max_len MAX_LEN] [--hidden_size HIDDEN_SIZE] [--emb_dim EMB_DIM] [--stats STATS] [--opt OPT] [--sched_mode SCHED_MODE] [--sched_step_size SCHED_STEP_SIZE] [--lrdec LRDEC] [--test_ckpt TEST_CKPT] [--fe_ckpt FE_CKPT] [--plateau_mode PLATEAU_MODE] [--model MODEL] [--train] [--test] [--test_log_file TEST_LOG_FILE] [--inorm_code] [--uni]

hdubey avatar Apr 22 '19 07:04 hdubey

Hello! I have been replicating this experiment recently, but during the process of making the dataset config file, do I know where to obtain these files. (-- train_scp data/LibriSpeed/libri_tr.scp -- test_scp data/LibriSpeed/libri_te.scp\

--Libri_ Dict data/LibriSpeed/Libri_ Dict. npy). I look forward to your reply very much. Thank you.

uuwz avatar Sep 07 '23 12:09 uuwz