AutoPST icon indicating copy to clipboard operation
AutoPST copied to clipboard

How to train SEA model

Open cyxomo opened this issue 4 years ago • 14 comments
trafficstars

The pretrained model sea.ckpt just fit dataset which have 82 speaker, However, I have a huge dataset including 300 speaker at least. How could I train a corresponding SAE model?

cyxomo avatar Aug 10 '21 03:08 cyxomo

Do you mean SEA?

You refer to the SEA paper for training details.

auspicious3000 avatar Aug 10 '21 03:08 auspicious3000

I seem to have fallen into a mistake. Actually , in preparing data , the Encoder part of SEA model just be used. But I'm not sure that changing the speaker will make a difference.

cyxomo avatar Aug 10 '21 03:08 cyxomo

Does it matter if I take my own data and extract the features from the SEA model of 82 speakers that you pre-trained

cyxomo avatar Aug 10 '21 03:08 cyxomo

Do you mean SEA?

You refer to the SEA paper for training details.

Yeah, sorry for spelling mistake

cyxomo avatar Aug 10 '21 03:08 cyxomo

The performance might degrade, but feel free to try.

auspicious3000 avatar Aug 10 '21 03:08 auspicious3000

The performance might degrade, but feel free to try.

So the right thing to do is to train an SEA model with my own data and then extract the features. Could the sea part training code be provided?

cyxomo avatar Aug 10 '21 03:08 cyxomo

The majority of the code for SEA is here. You just need a data loader and an optimizer.

auspicious3000 avatar Aug 10 '21 03:08 auspicious3000

The majority of the code for SEA is here. You just need a data loader and an optimizer.

OK, do you use the loss function like
image

cyxomo avatar Aug 10 '21 03:08 cyxomo

Yes

auspicious3000 avatar Aug 10 '21 04:08 auspicious3000

@auspicious3000 what is c_trg in model_sea.Generator.forward ? It is part of Decoder's LSTM, dimension is same as hparams.dim_spk which is 82, but still no idea how to get it ...

vasyarv avatar Sep 05 '21 14:09 vasyarv

It is the one-hot speaker embedding.

auspicious3000 avatar Sep 05 '21 15:09 auspicious3000

Do you mean SEA?

You refer to the SEA paper for training details.

Hi! Could you point me to the SEA paper? I want to make sure I am reading the right one

stalevna avatar Nov 08 '21 06:11 stalevna

Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery

auspicious3000 avatar Nov 08 '21 06:11 auspicious3000

@auspicious3000 Could you check my codes of SEA training loss below:

mask_sp_real = ~sequence_mask(len_real, cep_real0.size(1))# cep_real0 is MFCC that do not cut by [:, 0:20] mask = (~mask_sp_real).float() self.P = self.P.train() mel_outputs , mel_outputs_B= self.P(cep_real, spk_emb, mask)#mel_outputs_B is output of decoder with input of self Expressing autoencoded Z loss_A = F.mse_loss(mel_outputs, cep_real0,reduction='mean') loss_B = F.mse_loss(mel_outputs_B, cep_real0,reduction='mean') p_loss = loss_A + loss_B

wang1612 avatar Nov 16 '21 06:11 wang1612