AutoPST
AutoPST copied to clipboard
How to train SEA model
The pretrained model sea.ckpt just fit dataset which have 82 speaker, However, I have a huge dataset including 300 speaker at least. How could I train a corresponding SAE model?
Do you mean SEA?
You refer to the SEA paper for training details.
I seem to have fallen into a mistake. Actually , in preparing data , the Encoder part of SEA model just be used. But I'm not sure that changing the speaker will make a difference.
Does it matter if I take my own data and extract the features from the SEA model of 82 speakers that you pre-trained
Do you mean SEA?
You refer to the SEA paper for training details.
Yeah, sorry for spelling mistake
The performance might degrade, but feel free to try.
The performance might degrade, but feel free to try.
So the right thing to do is to train an SEA model with my own data and then extract the features. Could the sea part training code be provided?
The majority of the code for SEA is here. You just need a data loader and an optimizer.
The majority of the code for SEA is here. You just need a data loader and an optimizer.
OK, do you use the loss function like

Yes
@auspicious3000 what is c_trg in model_sea.Generator.forward ? It is part of Decoder's LSTM, dimension is same as hparams.dim_spk which is 82, but still no idea how to get it ...
It is the one-hot speaker embedding.
Do you mean SEA?
You refer to the SEA paper for training details.
Hi! Could you point me to the SEA paper? I want to make sure I am reading the right one
Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery
@auspicious3000 Could you check my codes of SEA training loss below:
mask_sp_real = ~sequence_mask(len_real, cep_real0.size(1))# cep_real0 is MFCC that do not cut by [:, 0:20] mask = (~mask_sp_real).float() self.P = self.P.train() mel_outputs , mel_outputs_B= self.P(cep_real, spk_emb, mask)#mel_outputs_B is output of decoder with input of self Expressing autoencoded Z loss_A = F.mse_loss(mel_outputs, cep_real0,reduction='mean') loss_B = F.mse_loss(mel_outputs_B, cep_real0,reduction='mean') p_loss = loss_A + loss_B