vishalbhavani

Results 10 comments of vishalbhavani

Thanks for the clarification. I was more concerned about the identity mismatch than the accent difference. The voice texture of boman and prosenjit is not preserved in the generated audios....

Thanks for the confirmation. I can send a PR with TSA support. I am facing an issue - when I try to use the predicted mels from the code in...

Directly optimizing for L1 loss using the code in inference.py(with the mel length fix) results in further deterioration. I can see that the forward pass in train and inference differ....

Actually, I kept the speaker encoding tensor trainable(initialized to `tgt_attributes["a_s"]`) and froze all parameters of `Tester` . Based on your suggestion, I used `inferece_exact_pitch` which improved the results. Trained on...

1. You mean plugging in the adversarial loss right? I haven't used it yet 2. The TSA proposed in NANSY does exactly this. Keeps the speaker representation trainable and backdrops...

I agree with both. Sharing the code. Let me know if you want to look at the audio samples as well. [code.zip](https://github.com/b04901014/UUVC/files/10087631/code.zip) inference_exact_pitch.py contains the exact original code which is...

It is indeed high. I deliberately increased it while doing initial experiments and forgot to revert it back. Yes, loss is going down. Tried `5e-5` but the improvement felt slow....

Also, why do we need https://github.com/b04901014/UUVC/blob/master/inference_exact_pitch.py#L162 ? I had to comment it out to make the shapes match

Also, the reconstructed audios have different volumes compared to the original audios. Any idea why is that?

So my current code has the following changes: 1. TSA to reconstruct target audio 2. Speaker identity as a parameter and freeze the model completely 3. Use exact duration and...