Rishikesh (ऋषिकेश)

Results 162 comments of Rishikesh (ऋषिकेश)

Hi @adelacvg Have you checked YODAS : https://huggingface.co/datasets/espnet/yodas 370k hours dataset, although data quality is poor as music is there or some samples are empty but still good quality data...

@adelacvg Everyone is GPU-poor, I am also waiting for my GPU to be vacated. By the way how's the progress with TTTS training do you have any sample to share?...

For v4 I am planning to train on Encodec features for better speaker generalization as commented here https://github.com/adelacvg/NS2VC/issues/16#issuecomment-2084663655 . Has anyone tried this before or like to give me any...

@adelacvg, do you have any thoughts on using Encodec's features rather than Mel-Specs and then using Vocos to convert that into Wavs? May be that leads to better generalization.

* Load images * Convert them to tensor or dataloader if you are training * Make sure the dimension of image is 3 and 4th dimension is batch dim *...

Hi @yt605155624 , they are talking about token frame rate rather than audio sample rate [we can use 16 khz semantic token to 24 khz acoustic token but there respective...

@yt605155624 yes yangdongchao/SoundStorm entirely different codebase, I think they are not yet implemented SoundStorm training mechanism. Mine implementation is very close to the paper. I may make some silly mistake...

I trained for 100k only and used LibriTTS-100clean dataset just for logic check, can you share your sample with greedy only decoding? And what semantic encoder you are using ?

@bharani-y I already anticipated that sampling logic might be faulty. I will re-check that again. Thanks

Yeah audio is quit gibbrish may be training long will make quality better or may be greedy sampling is not a solution at all. I have also get similar result...