Rishikesh (ऋषिकेश) comments

Results 160 comments of


                                            Rishikesh (ऋषिकेश)

trafficstars

Training Code availability

Another issue : ``` Traceback (most recent call last): File "/content/UniAudio/UniAudio/egs/TTS/../../infer.py", line 17, in from utils.dataloader import get_data_iterator_tokenizer_vocabulary File "/content/UniAudio/UniAudio/utils/dataloader.py", line 26, in from tools.tokenizer.AudioTagging.audio_tagging_tokenizer import AudioTaggingTokenizer ModuleNotFoundError: No module...

Training Code availability

Couples of more minor bugs : 1. String quotes are inconsistent https://github.com/yangdongchao/UniAudio/blob/0552aa3faa0314e87641f8cf4176975d95670814/UniAudio/tools/tokenizer/soundstream/AudioTokenizer.py#L43 ``` self.ckpt_path = f'UniAudio/checkpoints/{tag}_model/model.pth' ``` 2. This line should be commented https://github.com/yangdongchao/UniAudio/blob/0552aa3faa0314e87641f8cf4176975d95670814/UniAudio/tools/tokenizer/soundstream/AudioTokenizer.py#L124

Some minor bugs inside Hifi-Codec code

As this repo is still a work in progress having some minor bugs are understandable, my focus currently on HiFi-Codec as I am testing that. But yes, there are some...

Code release inquiry

@zhengkw18 any update

Finetuning

Hi @jasonppy I am finetuning the 330M TTS model on multi-lingual data, and here is the tensorboard ![image](https://github.com/jasonppy/VoiceCraft/assets/4656872/5e431612-2abd-410b-982d-d37bc614ffb3) With finetuning on a single A6000 with max num of token 10K...

Finetuning

@jasonppy Yes 52-55 % start producing good voice, I will also let you know it works great with multi-lingual data. I finetuned this on 3 lang data and even when...

Finetuning

Hi @jasonppy For multi-lingual, I don't do anything extra I just rely on Espeak-ng phonemes. Create a dataset based on Phonemizer language-based phonemes, mix all multi-lingual datasets, and mix the...

Finetuning

But I think when we include lots of languages and accents it might not work as intended because many IPA phonemes are shared between the languages, so might be needed...

Finetuning

@thivux Yes it's sensitive to the hyper-params but it gives good performance on certain parameters

Finetuning

Nope I only finetune the model, training from scratch will required too much data and compute. I listened to the files generated from diverse set of paragraph and listen, is...