Li-Wei Chen

Results 19 comments of Li-Wei Chen

During training, you can pass the target Mel-spectrogram length to the `self.Ep` and `self.u2m` (see line 123, 126 in `trainer.py`) to force the output shape to be the same as...

It's hard to judge without the code. I could think of some pitfalls: The only trainable modules should be `self.RVEncoder.wav2vec2.spk_encoder` and `self.RVEncoder.linear_spk` if we want to target the speaker encoder...

I see. Hmm. I'm not sure unplugging the adversarial loss will have that a huge effect on performance. I am instead wary about the discarding the speaker encoder and replacing...

1. I mean that without adversarial loss it should still work, the original TSA also only use L1 loss. But it didn't so there is maybe something else not working....

I think the learning rate is too high. Can you try `5e-5`? Also did the loss actually decrease? Yeah it would be great to have some samples.

> Also, why do we need https://github.com/b04901014/UUVC/blob/master/inference_exact_pitch.py#L162 ? I had to comment it out to make the shapes match You are right. It should be redundant in this context since...

> It is indeed high. I deliberately increased it while doing initial experiments and forgot to revert it back. Yes, loss is going down. Tried `5e-5` but the improvement felt...

> Also, the reconstructed audios have different volumes compared to the original audios. Any idea why is that? > I used loud normalization at the `inference.py`, try to comment out...

The identity seems to really get better! But there do have some weird noise there which should originally be caught by the adversarial loss. What you are doing sounds good...