Soshyant comments

Results 40 comments of


                                            Soshyant

IndexError: index out of range in self | Issue

Damn, thanks a lot for the fast response! I already closed the kaggle env, but I can share the notebook (it's mostly based on the official hf tutorial) I won't...

IndexError: index out of range in self | Issue

> Damn, thanks a lot for the fast response! I already closed the kaggle env, but I can share the notebook (it's mostly based on the official hf tutorial) I...

Trained StyleTTS2 for Hindi but didn't get good results

Are you sure your Phonemization is correct? don't trust bootphon's Phonemzier (the one with Espeak backend). that is garbage for most languages. if you're getting really bad results, most likely...

Inference latency

HifiGAN is essentially larger and heavier. you need to either find another ckpt pretrained on ISTFT or train a new model yourself from scratch. you can also fine tune on...

Inference latency

You're welcome. as i've said, your choice of max_len or the dataset shouldn't matter. only the decoder has the largest impact.

Inference latency

Unless you change the decoder, or use very short samples with LFInference, there must not be a whole lot of latency overhead

a Few questions about Scaling, Non-verbal cues, etc.

@Paulmzr Thank you very much for taking your time to answer me; I really appreciate it. I agree 100% that Libri on general is not high quality. Since my domain...

a Few questions about Scaling, Non-verbal cues, etc.

> Hi, [@Respaired](https://github.com/Respaired) > > [SLED-TTS/sled/sled.py](https://github.com/ictnlp/SLED-TTS/blob/b8ed10d9953160efd8a0538b4ea5af80a57c9e96/sled/sled.py#L60) > > Line 60 in [b8ed10d](/ictnlp/SLED-TTS/commit/b8ed10d9953160efd8a0538b4ea5af80a57c9e96) > self.z_proj = nn.Linear(self.token_embed_dim, self.hidden_size, bias=True) > > I think this line is unnecessary, if the dimensions are...

a Few questions about Scaling, Non-verbal cues, etc.

@Paulmzr Looks like Nvidia's audio codec 44.1khz have a dim of 32. it's slightly slower than DAC and moderately slower than EnCodec. but seems like a very good alternative to...

a Few questions about Scaling, Non-verbal cues, etc.

I think he said pre-VQ latents are too smooth iirc. The mlp in this model is very picky when it comes to what codec you can use. I had very...