Prince Canuma
Prince Canuma
- [ ] Multi-turn conversation - [ ] Load models on the fly - [ ] Support multiple images
Adds support for Llama Vision and streamlined resize image. - [ ] Test multi-image - [ ] Add to trainer Closes #60
MLX currently lacks built-in support for weight normalization, which is a crucial feature for various deep learning architectures, particularly in audio processing and generative models. Weight normalization is a reparameterization...
# Description This PR matches the fix for idefics 2 and 3 on mlx-vlm. Source: https://github.com/Blaizzy/mlx-vlm/pull/191 @davidkoski @awni I'm not a swift expert and would love to start contributing. This...
- [x] Add Wav2vec model as STT (ASR and Audio classification) - [ ] The torch mel seems to work better than current - [ ] Fix model skipping the...
Sesame current structure doesn't allow to convert the source model without changing the utils.
_Originally posted by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/107#pullrequestreview-2813297998_
https://x.com/eaccelerate_42/status/1916232819082494155?s=46
> The model still struggles with excessively long pauses in some situations, especially the ellipsis break used in the example here: 'But the existential risks... ' -- I wonder if...
## Overview This issue outlines our roadmap for integrating additional text-to-speech (TTS) and speech-to-speech (STS) models into the MLX-Audio library to expand our offerings beyond the current Kokoro model. ##...