Senstella

Results 13 comments of Senstella

@Blaizzy I just noticed IndexTTS uses non-standard BigVGAN, which incoperates conditioning inside of BigVGAN using an ECAPA-TDNN model, such as: ```diff for i in range(self.num_upsamples): # Upsampling for i_up in...

Thank you! I just subclassed `BigVGAN` in `indextts/bigvgan.py` as `BigVGANConditioning` and applied conditioning there like: ```py class BigVGANConditioning(BigVGAN): def __init__(self, config: BigVGANConditioningConfig): super().__init__(config) ``` Also, I've implemented another instance of...

I think it's ready for review! https://huggingface.co/mlx-community/IndexTTS-1.5 https://huggingface.co/mlx-community/IndexTTS ```bash python -m mlx_audio.tts.generate --model mlx-community/IndexTTS-1.5 --text "Describe this image." --ref_audio test.wav ``` (The model depends on given reference audio, so must...

> I noticed the WER in English seems quite a bit higher than they advertise in their results -- is that the case with the torch model as well? It...

> When you generate via python script, isn't ref_audio supposed to accept file name, not mlx array? I think other models accept file names if I'm not mistaken? I followed...

I've added a `conv.py` file with the `WNConv1d` and `WNConvTranspose1d` classes from `descript/nn/layers.py`, also I've updated the weights on HF. Should be all set now!

Test should pass now, the problem was `WNConvTranspose1d` calls `mx.conv_transpose1d` with parameters ```py y = mx.conv_transpose1d( x, weight, self.stride, self.padding, self.dilation, self.groups ) ``` while `mx.conv_transpose1d` accepts: ```py (input, weight,...

I think [MLX only quantizes modules that implement `to_quantized(group_size, bits)`](https://ml-explore.github.io/mlx/build/html/python/_autosummary/mlx.core.quantize.html#mlx.core.quantize). However, BigVGAN codec doesn't contain any `Linear` nor `Embedding`. And [`Conv1d` layers doesn't have `to_quantized` method](https://ml-explore.github.io/mlx/build/html/python/nn/_autosummary/mlx.nn.Conv1d.html), so in theory the...

I just included the original `wetext` normalizer, but please let me know if you prefer otherwise!

> However, I wonder if it's possible to normalize the text without adding extra dependencies? Because this will keep us light and make the swift port easier as well. I...