Aaron (Yinghao) Li
Aaron (Yinghao) Li
The normalization may have possibly amplified the noises slightly, yet the point of log mel spec is actually the opposite: it tries to emphasize the speech instead of the noise....
@Charlottecuc Sorry for the late reply. I was pretty busy at the end of the year. You can make all `x_input` corrupted, but I'd recommend you set each transformation with...
@Charlottecuc I'm sorry for the late reply because this issue was closed and I didn't get any notification. Not sure if it has been resolved, but what I meant was...
@skol101 You need to pass in a noisy version here, call it `x_input`. The `x_input` is processed in `meldataset.py` with noises and reveberations.
@Charlottecuc Sorry I'm pretty busy with my other paper submissions so I can't join the discussion at this point, but I have reopened the issue for further discussion and will...
@Charlottecuc I do have some time now to discuss this problem. I have noticed similar problems with noisy input and have not yet come up with a good solution. The...
I believe it could simply be because there's not enough training data. Any-to-many conversion requires a lot of input data for the model to generalize well.
@skol101 I don't believe so, if it's not for any-to-any, you only need to have a lot of input speakers. You do not need cycle loss in this case, because...
@skol101 I don't think you need this either, it is to make the style encoder speaker-independent so you can convert to any *output* speaker. If you are only interested in...
What you are asking is an open research question that nobody has an answer to at this point, but I will give you my two cents on this issue. It...