김강욱 (Kang-wook Kim) comments

Results 8 comments of


                                            김강욱 (Kang-wook Kim)

Speech+Transcript conditioned phoneme recognition as an alternative to G2P

Hi, sorry for the late response. 1. Yes. It would be better if there is a phoneme recognizer that gets speech and transcript as an input. But I've never seen...

One-to-Many

Hi, I didn't understand the "one-vs-rest model" you said. Could you explain more about the "one-vs-rest model" you mean? You can use this project as an "any-to-many" or "many-to-many" conversion...

Controllable and Interpretable Singing Voice Decomposition via Assem-VC

@AK391 Thanks for your interest! Currently, we don't have a specific plan to release the code of that paper. We will add the link to the paper and demo page...

Controllable and Interpretable Singing Voice Decomposition via Assem-VC

@iehppp2010 Hi. I think your alignment encoder, 'Cotatron' doesn't seem to be working properly. As explained in the paper, we transferred Cotatron from pre-trained weights, which are trained with LibriTTS...

Controllable and Interpretable Singing Voice Decomposition via Assem-VC

@iehppp2010 Yes. You first have to fine-tune Cotatron model on the singing dataset, because the average duration of each phoneme is much longer in the singing dataset. It would generate...

Possible bottleneck?

Hi. In my case, I tried using `distributed_backend='ddp'` as that warning recommended. However, multi-GPU training error occurs in the following situations: - when the first GPU (i.e. ID 0) is...

Possible bottleneck?

In order to completely solve this problem, we need a version up of the PyTorch Lightning module. However, there are conflicts between the pl versions, so we plan to check...

[report] I think this is a bug.

I pinned this issue. Thanks for letting us know!