Rishikesh (ऋषिकेश) comments

Results 162 comments of


                                            Rishikesh (ऋषिकेश)

branch in V4 version train it's working ?

Hi @adelacvg Have you checked YODAS : https://huggingface.co/datasets/espnet/yodas 370k hours dataset, although data quality is poor as music is there or some samples are empty but still good quality data...

branch in V4 version train it's working ?

@adelacvg Everyone is GPU-poor, I am also waiting for my GPU to be vacated. By the way how's the progress with TTTS training do you have any sample to share?...

branch in V4 version train it's working ?

For v4 I am planning to train on Encodec features for better speaker generalization as commented here https://github.com/adelacvg/NS2VC/issues/16#issuecomment-2084663655 . Has anyone tried this before or like to give me any...

Issues with preserving the speaker identity

@adelacvg, do you have any thoughts on using Encodec's features rather than Mel-Specs and then using Vocos to convert that into Wavs? May be that leads to better generalization.

image input

* Load images * Convert them to tensor or dataloader if you are training * Make sure the dimension of image is 3 and 4th dimension is batch dim *...

Problems with SoundStorm

Hi @yt605155624 , they are talking about token frame rate rather than audio sample rate [we can use 16 khz semantic token to 24 khz acoustic token but there respective...

Problems with SoundStorm

@yt605155624 yes yangdongchao/SoundStorm entirely different codebase, I think they are not yet implemented SoundStorm training mechanism. Mine implementation is very close to the paper. I may make some silly mistake...

Problems with SoundStorm

I trained for 100k only and used LibriTTS-100clean dataset just for logic check, can you share your sample with greedy only decoding? And what semantic encoder you are using ?

Problems with SoundStorm

@bharani-y I already anticipated that sampling logic might be faulty. I will re-check that again. Thanks

Problems with SoundStorm

Yeah audio is quit gibbrish may be training long will make quality better or may be greedy sampling is not a solution at all. I have also get similar result...