Rishikesh (ऋषिकेश) comments

Results 160 comments of


                                            Rishikesh (ऋषिकेश)

trafficstars

Problems with SoundStorm

Yes Greedy is not enough, because this task is many to many conversions which is quite hectic for single run greedy solutions. My training loss is around 5 (ideally loss...

@bharani-y Teacher training set is a knowledge distillation method which called teacher student learning, where we are training a large Teacher model to learn probability distribution of the complex data,...

Problems with SoundStorm

> PS. I dived into some of the NAR (non-autoregressive) machine translation papers and the consensus was that training a NAR model (and they use even more "tricks" than SoundStorm)...

Problems with SoundStorm

@bharani-y are you same `dataloader` and `variable random window` as in my repo ? Also which semantic tokenizer you are using and how many clusters in your semantic dataset ?

Problems with SoundStorm

Currently I am training model on Large Libri-TTS dataset from here : https://huggingface.co/datasets/collabora/whisperspeech/tree/main

Problems with SoundStorm

> From my experiment I confirm that just doing some very naive semantic tokens upsampling using hubert (50hz) to match encodec (75hz) works, can get some reasonable voices. My own...

Problems with SoundStorm

> below I provide the core code and one sample, which I think is very close to the paper's description > https://github.com/feng-yufei/shared_debugging_code/blob/main/soundstorm.py, hope it can be useful @feng-yufei sample is...

Problems with SoundStorm

> For experiment on larger dataset I tried LibriTTS 100/360/500 merged together, the quality is strangely bad.(50% top 10 training accuracy while LJspeech has 65%). I have also trained on...

Make it multi-language?

@p0p4k sample sounds good, I think with more training it will getting lot better. I think multi-linguility is easy to implement in this repo. I think problem occurs when you...

Training Code availability

Currently facing this issue : ``` run.sh: line 37: utils/parse_options.sh: No such file or directory Prepare LibriTTS dataset split the data for 1 GPUs cat: data/val/wav.scp: No such file or...