Aaron (Yinghao) Li comments

Results 110 comments of


                                            Aaron (Yinghao) Li

trafficstars

Some doubt about any to any voice conversion

Sorry for the late reply. I hope you've got some good results using x-vector, though I believe it would not work better than style encoder alone because x-vector has much...

Some doubt about any to any voice conversion

I think it depends on the number of speakers you have in the training set and what your latent space of the speaker embedding looks like. Usually, a multivariable Gaussian...

Some doubt about any to any voice conversion

I think 1300y_out is very similar to Ref_wav, so the good news is that the generator is capable of reconsrtucting unseen speakers without any further training. Have you tried to...

Some doubt about any to any voice conversion

This looks promising, so the problem probably is in the style encoder then. Can I know how many speakers you used to train the style encoder and how many discriminators...

Some doubt about any to any voice conversion

I have listened to "0y_out_huangmeixi_error_f0" you uploaded and if I understand correctly, you probably think the style is somehow "overfitted" in the sense that it also encodes the F0 of...

Some doubt about any to any voice conversion

I don't think StyleGAN2 is relevant to StarGANv2, because the main difference in StyleGAN2 is they changed the instance normalization without the affine component (i.e., only normalize and learn the...

Some doubt about any to any voice conversion

Back to the style encoder problem, how do you encode unseen speakers if you have unshared components?

Some doubt about any to any voice conversion

@980202006 It's definitely possible to add wavelet transform to the model and it could theoretically make a big difference because the high-frequency content is what makes speech clear even though...

Some doubt about any to any voice conversion

Back to the style encoder problem, so I think you removed the shared linear layers (N of them where N is the number of speakers) and replaced it with a...

Some doubt about any to any voice conversion

@980202006 70 speakers are definitely not enough, so increasing the number of speakers probably would help. Separate projections may or may not make a difference and that's what I think...