SpeechSplit F0 Converter for P - loss function values

I am trying to replicate your work. I am currently making F0 converter model for P checkpoint generation. I am stuck at loss calculation.

I see when I use F0_Converter model to generate P, I get a 257 dimension one-hot encoded feature P.

Demo.ipynb

f0_pred = P(uttr_org_pad, f0_trg_onehot)[0]
f0_pred.shape
> torch.Size([192, 257])

I wanted to ask you when training the F0 converter model, what is the value that you are using to calculate the loss?

I tried using the following value but I am not sure if that is the right way. This is what I am doing to generate f0_pred and to calculate the loss:

f0_pred = self.P(x_real_org,f0_org_intrp)[0]
p_loss_id = F.mse_loss(f0_pred,f0_org_intrp,reduction='mean')

I just want to know if I am on the right track. Can you help me out here @auspicious3000

Apr 14 '21 18:04 rishabhjain16

The output of the f0 predictor is 257 dim logit instead of one-hot. So, you need to use cross-entropy loss as indicated in the paper.

Apr 14 '21 20:04 auspicious3000

Thank you for your quick response.I understand what you are saying. I found that in the appendix of paper. What I meant to ask are the 2 values you are using to calculate the loss. How are you getting the value of f0_orig in 257 dim to feed into the loss function.

Loss function requires 2 values. One is f0_pred which is the output of F0_converter model. What is the other value?

What I am asking is the input for the cross entropy loss?

Apr 14 '21 20:04 rishabhjain16

The target is the quantized the ground truth f0, based on https://arxiv.org/abs/2004.07370

Apr 15 '21 01:04 auspicious3000

Thanks for your help. Paper covered most of my doubts. Great read.

Apr 15 '21 14:04 rishabhjain16

In the 'Train the generator' section of solver.py:

        self.G = self.G.train()
        self.P = self.P.train()
                    
        # G Identity mapping loss
        x_f0 = torch.cat((x_real_org, f0_org), dim=-1)
        x_f0_intrp = self.Interp(x_f0, len_org) 

        f0_org_intrp = quantize_f0_torch(x_f0_intrp[:,:,-1])[0]
        x_f0_intrp_org = torch.cat((x_f0_intrp[:,:,:-1], f0_org_intrp), dim=-1)

        # G forward
        x_pred = self.G(x_f0_intrp_org, x_real_org, emb_org)
        g_loss_id = F.mse_loss(x_real_org, x_pred, reduction='mean') 

        
        # Preprocess f0_trg for P 
        x_f0_trg = torch.cat((x_real_trg, f0_trg), dim=-1)
        x_f0_intrp_trg = self.Interp(x_f0_trg, len_trg) 

        # Target for P
        f0_trg_intrp = quantize_f0_torch(x_f0_intrp_trg[:,:,-1])[0]

        # P forward
        f0_pred = self.P(x_real_org,f0_trg_intrp)
        f0_trg_intrp_indx = f0_trg_intrp.transpose(1,2).argmax(2)
        p_loss_id = F.cross_entropy(f0_pred,f0_trg_intrp_indx, reduction='mean')

        # Backward and optimize.
        g_loss = g_loss_id
        p_loss = p_loss_id
        self.reset_grad()
        g_loss.backward()
        p_loss.backward()
        self.g_optimizer.step()
        self.p_optimizer.step()

        # Logging.
        loss = {}
        loss['G/loss_id'] = g_loss_id.item()
        loss['P/loss_id'] = p_loss_id.item()

This appears to be working for me (ie seems to run at least!)

Jul 21 '21 14:07 Merlin-721

In the 'Train the generator' section of solver.py:

        self.G = self.G.train()
        self.P = self.P.train()
                    
        # G Identity mapping loss
        x_f0 = torch.cat((x_real_org, f0_org), dim=-1)
        x_f0_intrp = self.Interp(x_f0, len_org) 

        f0_org_intrp = quantize_f0_torch(x_f0_intrp[:,:,-1])[0]
        x_f0_intrp_org = torch.cat((x_f0_intrp[:,:,:-1], f0_org_intrp), dim=-1)

        # G forward
        x_pred = self.G(x_f0_intrp_org, x_real_org, emb_org)
        g_loss_id = F.mse_loss(x_real_org, x_pred, reduction='mean') 

        
        # Preprocess f0_trg for P 
        x_f0_trg = torch.cat((x_real_trg, f0_trg), dim=-1)
        x_f0_intrp_trg = self.Interp(x_f0_trg, len_trg) 

        # Target for P
        f0_trg_intrp = quantize_f0_torch(x_f0_intrp_trg[:,:,-1])[0]

        # P forward
        f0_pred = self.P(x_real_org,f0_trg_intrp)
        f0_trg_intrp_indx = f0_trg_intrp.transpose(1,2).argmax(2)
        p_loss_id = F.cross_entropy(f0_pred,f0_trg_intrp_indx, reduction='mean')

        # Backward and optimize.
        g_loss = g_loss_id
        p_loss = p_loss_id
        self.reset_grad()
        g_loss.backward()
        p_loss.backward()
        self.g_optimizer.step()
        self.p_optimizer.step()

        # Logging.
        loss = {}
        loss['G/loss_id'] = g_loss_id.item()
        loss['P/loss_id'] = p_loss_id.item()

This appears to be working for me (ie seems to run at least!)

Hello, I want to now where the x_real_trg come from...

Sep 25 '21 04:09 3139725181

I've changed some of the code around since, but hopefully this helps a bit. Both 'org' and 'trg' are just different instances. I had just tried applying some of the code from elsewhere in the repo to training so used these naming conventions. You can see here I've used the same instances to train both models:

            x_real_org, emb_org, f0_org, len_org = next(data_iter)
            # applies .to(self.device) to each:
            x_real_org, emb_org, len_org, f0_org = self.data_to_device([x_real_org, emb_org, len_org, f0_org])

            # combines spect and f0s
            x_f0 = torch.cat((x_real_org, f0_org), dim=-1)
            # Random resampling with linear interpolation
            x_f0_intrp = self.Interp(x_f0, len_org) 
            # strips f0 from trimmed to quantize it
            f0_org_intrp = quantize_f0_torch(x_f0_intrp[:,:,-1])[0]

            self.G = self.G.train()
            # combines quantized f0 back with spect
            x_f0_intrp_org = torch.cat((x_f0_intrp[:,:,:-1], f0_org_intrp), dim=-1)

            # G forward
            x_pred = self.G(x_f0_intrp_org, x_real_org, emb_org)
            g_loss_id = F.mse_loss(x_pred, x_real_org, reduction='mean') 

            # Backward and optimize.
            self.g_optimizer.zero_grad()
            g_loss_id.backward()
            self.g_optimizer.step()

            loss['G/loss_id'] = g_loss_id.item()

          # =================================================================================== #
          #                               3. F0_Converter Training                              #
          # =================================================================================== #


            self.P = self.P.train()
            f0_trg_intrp_indx = f0_org_intrp.argmax(2)

            # P forward
            f0_pred = self.P(x_real_org,f0_org_intrp)
            p_loss_id = F.cross_entropy(f0_pred.transpose(1,2),f0_trg_intrp_indx, reduction='mean')


            self.p_optimizer.zero_grad()
            p_loss_id.backward()
            self.p_optimizer.step()
            loss['P/loss_id'] = p_loss_id.item()

Sep 25 '21 13:09 Merlin-721

SpeechSplit SpeechSplit copied to clipboard

F0 Converter for P - loss function values

SpeechSplit
SpeechSplit copied to clipboard