Li-Wei Chen comments

Results 19 comments of


                                            Li-Wei Chen

Test time speaker adaptation

During training, you can pass the target Mel-spectrogram length to the `self.Ep` and `self.u2m` (see line 123, 126 in `trainer.py`) to force the output shape to be the same as...

It's hard to judge without the code. I could think of some pitfalls: The only trainable modules should be `self.RVEncoder.wav2vec2.spk_encoder` and `self.RVEncoder.linear_spk` if we want to target the speaker encoder...

Test time speaker adaptation

I see. Hmm. I'm not sure unplugging the adversarial loss will have that a huge effect on performance. I am instead wary about the discarding the speaker encoder and replacing...

Test time speaker adaptation

1. I mean that without adversarial loss it should still work, the original TSA also only use L1 loss. But it didn't so there is maybe something else not working....

Test time speaker adaptation

I think the learning rate is too high. Can you try `5e-5`? Also did the loss actually decrease? Yeah it would be great to have some samples.

Test time speaker adaptation

> Also, why do we need https://github.com/b04901014/UUVC/blob/master/inference_exact_pitch.py#L162 ? I had to comment it out to make the shapes match You are right. It should be redundant in this context since...

Test time speaker adaptation

> It is indeed high. I deliberately increased it while doing initial experiments and forgot to revert it back. Yes, loss is going down. Tried `5e-5` but the improvement felt...

Test time speaker adaptation

> Also, the reconstructed audios have different volumes compared to the original audios. Any idea why is that? > I used loud normalization at the `inference.py`, try to comment out...

Test time speaker adaptation

The identity seems to really get better! But there do have some weird noise there which should originally be caught by the adversarial loss. What you are doing sounds good...