Rishikesh (ऋषिकेश)

Results 160 comments of Rishikesh (ऋषिकेश)
trafficstars

Another issue : ``` Traceback (most recent call last): File "/content/UniAudio/UniAudio/egs/TTS/../../infer.py", line 17, in from utils.dataloader import get_data_iterator_tokenizer_vocabulary File "/content/UniAudio/UniAudio/utils/dataloader.py", line 26, in from tools.tokenizer.AudioTagging.audio_tagging_tokenizer import AudioTaggingTokenizer ModuleNotFoundError: No module...

Couples of more minor bugs : 1. String quotes are inconsistent https://github.com/yangdongchao/UniAudio/blob/0552aa3faa0314e87641f8cf4176975d95670814/UniAudio/tools/tokenizer/soundstream/AudioTokenizer.py#L43 ``` self.ckpt_path = f'UniAudio/checkpoints/{tag}_model/model.pth' ``` 2. This line should be commented https://github.com/yangdongchao/UniAudio/blob/0552aa3faa0314e87641f8cf4176975d95670814/UniAudio/tools/tokenizer/soundstream/AudioTokenizer.py#L124

As this repo is still a work in progress having some minor bugs are understandable, my focus currently on HiFi-Codec as I am testing that. But yes, there are some...

@zhengkw18 any update

Hi @jasonppy I am finetuning the 330M TTS model on multi-lingual data, and here is the tensorboard ![image](https://github.com/jasonppy/VoiceCraft/assets/4656872/5e431612-2abd-410b-982d-d37bc614ffb3) With finetuning on a single A6000 with max num of token 10K...

@jasonppy Yes 52-55 % start producing good voice, I will also let you know it works great with multi-lingual data. I finetuned this on 3 lang data and even when...

Hi @jasonppy For multi-lingual, I don't do anything extra I just rely on Espeak-ng phonemes. Create a dataset based on Phonemizer language-based phonemes, mix all multi-lingual datasets, and mix the...

But I think when we include lots of languages and accents it might not work as intended because many IPA phonemes are shared between the languages, so might be needed...

@thivux Yes it's sensitive to the hyper-params but it gives good performance on certain parameters

Nope I only finetune the model, training from scratch will required too much data and compute. I listened to the files generated from diverse set of paragraph and listen, is...