Prince Canuma issues

Results 63 issues of


                                            Prince Canuma

ChatUI improvements

- [ ] Multi-turn conversation - [ ] Load models on the fly - [ ] Support multiple images

good first issue

Add support for Llama-3.2-vision & Resize image

Adds support for Llama Vision and streamlined resize image. - [ ] Test multi-image - [ ] Add to trainer Closes #60

Feature Request: Add Weight Normalization Support (weight_norm)

MLX currently lacks built-in support for weight normalization, which is a crucial feature for various deep learning architectures, particularly in audio processing and generative models. Weight normalization is a reparameterization...

FIx idefics3 do-image-split

# Description This PR matches the fix for idefics 2 and 3 on mlx-vlm. Source: https://github.com/Blaizzy/mlx-vlm/pull/191 @davidkoski @awni I'm not a swift expert and would love to start contributing. This...

SparkTTS Voice cloning (Wav2vec)

- [x] Add Wav2vec model as STT (ASR and Audio classification) - [ ] The torch mel seems to work better than current - [ ] Fix model skipping the...

Can't convert sesame

Sesame current structure doesn't allow to convert the source model without changing the utils.

bug

BaseModel class with stream_generate and generate

_Originally posted by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/107#pullrequestreview-2813297998_

Dia output audio is too fast

https://x.com/eaccelerate_42/status/1916232819082494155?s=46

Dia struggles with long pauses

> The model still struggles with excessively long pauses in some situations, especially the ellipsis break used in the example here: 'But the existential risks... ' -- I wonder if...

enhancement

TTS and STS Models to port to MLX-Audio (Roadmap)

## Overview This issue outlines our roadmap for integrating additional text-to-speech (TTS) and speech-to-speech (STS) models into the MLX-Audio library to expand our offerings beyond the current Kokoro model. ##...

good first issue