pase
pase copied to clipboard
Training PASE architecture for only Speaker ID using Librispeech data
Hi Mirco, Santi, Thanks again for this great contributions. I had a look at codes and paper. The architecture is interesting. I want to train this architecture on Librispeech for speaker ID in same say as SincNet is trained. What will be the best way to do it. Assume I have all training and test data prepared as per the protocols of SincNet paper. I want to extract supervised Bottleneck features after it is trained to see how overall FER compares with original SincNet.
Hi @hdubey ,
Do you mean the mutual information training with SincNet (https://arxiv.org/pdf/1812.00271.pdf) or the purely supervised training? I have just uploaded a config file cfg/SincNet_worker.cfg
that incorporates the training mechanism of SincNet as MI-only in case you refer to the unsupervised mutual information training. The way to train it would be by specifying the flag --net_cfg
in the train.py
script to point to the new config file I mention.
If you mean the supervised training part, then have a look at spk_id/nnet.py
, where you have to specify the PASE config --fe_cfg ../cfg/PASE.cfg
without a pretrained ckpt (nothing in --fe_ckpt
) and it will attach the selected classifier --model mlp
on top of the front-end. In this case the way to specify the training/validation/test partitions is pretty standard, you handle the --train_guia
with filepath pointers, the --test_guia
too, and validation will be selected as a randomly sampled subset of --train_guia
files (controlled with the ratio parameter --va_split
that defaults to 20%).
Hope this helps, Santi
Hi Santi, Thanks for suggesting this. I just got the unsupervised MI training started. However, I am more interested in Supervised Speaker ID on Librispeech. When I do python spk_id/nnet.py I get following error "ImportError: No module named 'waveminionet' ".
It is not clear how many arguments are needed to run the supervised one. I want to try RNN classifier after front-end what will be the command in that case. Thanks!
Hi @santi-pdp I fixed the waveminionet issue. However, there seems to be a mandatory data "SPK2iDX", how to generate it for Librispeech? In below command, how can I get the best parameter set for Librispeech for re-producing the supervised PASE speaker ID results that outperformed the SincNets? Thanks!
nnet_copy.py [-h] [--fe_cfg FE_CFG] [--save_path SAVE_PATH] [--data_root DATA_ROOT] [--batch_size BATCH_SIZE] [--train_guia TRAIN_GUIA] [--test_guia TEST_GUIA] [--spk2idx SPK2IDX] [--log_freq LOG_FREQ] [--epoch EPOCH] [--patience PATIENCE] [--seed SEED] [--no-cuda] [--no-rnn] [--ft_fe] [--z_bnorm] [--va_split VA_SPLIT] [--lr LR] [--momentum MOMENTUM] [--max_len MAX_LEN] [--hidden_size HIDDEN_SIZE] [--emb_dim EMB_DIM] [--stats STATS] [--opt OPT] [--sched_mode SCHED_MODE] [--sched_step_size SCHED_STEP_SIZE] [--lrdec LRDEC] [--test_ckpt TEST_CKPT] [--fe_ckpt FE_CKPT] [--plateau_mode PLATEAU_MODE] [--model MODEL] [--train] [--test] [--test_log_file TEST_LOG_FILE] [--inorm_code] [--uni]
Hello! I have been replicating this experiment recently, but during the process of making the dataset config file, do I know where to obtain these files. (-- train_scp data/LibriSpeed/libri_tr.scp -- test_scp data/LibriSpeed/libri_te.scp\
--Libri_ Dict data/LibriSpeed/Libri_ Dict. npy). I look forward to your reply very much. Thank you.