Aaron (Yinghao) Li

Results 110 comments of Aaron (Yinghao) Li
trafficstars

Sorry for the late reply. I hope you've got some good results using x-vector, though I believe it would not work better than style encoder alone because x-vector has much...

I think it depends on the number of speakers you have in the training set and what your latent space of the speaker embedding looks like. Usually, a multivariable Gaussian...

I think 1300y_out is very similar to Ref_wav, so the good news is that the generator is capable of reconsrtucting unseen speakers without any further training. Have you tried to...

This looks promising, so the problem probably is in the style encoder then. Can I know how many speakers you used to train the style encoder and how many discriminators...

I have listened to "0y_out_huangmeixi_error_f0" you uploaded and if I understand correctly, you probably think the style is somehow "overfitted" in the sense that it also encodes the F0 of...

I don't think StyleGAN2 is relevant to StarGANv2, because the main difference in StyleGAN2 is they changed the instance normalization without the affine component (i.e., only normalize and learn the...

Back to the style encoder problem, how do you encode unseen speakers if you have unshared components?

@980202006 It's definitely possible to add wavelet transform to the model and it could theoretically make a big difference because the high-frequency content is what makes speech clear even though...

Back to the style encoder problem, so I think you removed the shared linear layers (N of them where N is the number of speakers) and replaced it with a...

@980202006 70 speakers are definitely not enough, so increasing the number of speakers probably would help. Separate projections may or may not make a difference and that's what I think...