Soshyant

Results 40 comments of Soshyant

Damn, thanks a lot for the fast response! I already closed the kaggle env, but I can share the notebook (it's mostly based on the official hf tutorial) I won't...

> Damn, thanks a lot for the fast response! I already closed the kaggle env, but I can share the notebook (it's mostly based on the official hf tutorial) I...

Are you sure your Phonemization is correct? don't trust bootphon's Phonemzier (the one with Espeak backend). that is garbage for most languages. if you're getting really bad results, most likely...

HifiGAN is essentially larger and heavier. you need to either find another ckpt pretrained on ISTFT or train a new model yourself from scratch. you can also fine tune on...

You're welcome. as i've said, your choice of max_len or the dataset shouldn't matter. only the decoder has the largest impact.

Unless you change the decoder, or use very short samples with LFInference, there must not be a whole lot of latency overhead

@Paulmzr Thank you very much for taking your time to answer me; I really appreciate it. I agree 100% that Libri on general is not high quality. Since my domain...

> Hi, [@Respaired](https://github.com/Respaired) > > [SLED-TTS/sled/sled.py](https://github.com/ictnlp/SLED-TTS/blob/b8ed10d9953160efd8a0538b4ea5af80a57c9e96/sled/sled.py#L60) > > Line 60 in [b8ed10d](/ictnlp/SLED-TTS/commit/b8ed10d9953160efd8a0538b4ea5af80a57c9e96) > self.z_proj = nn.Linear(self.token_embed_dim, self.hidden_size, bias=True) > > I think this line is unnecessary, if the dimensions are...

@Paulmzr Looks like Nvidia's audio codec 44.1khz have a dim of 32. it's slightly slower than DAC and moderately slower than EnCodec. but seems like a very good alternative to...

I think he said pre-VQ latents are too smooth iirc. The mlp in this model is very picky when it comes to what codec you can use. I had very...