mlx-audio icon indicating copy to clipboard operation
mlx-audio copied to clipboard

Add Chatterbox

Open DePasqualeOrg opened this issue 1 month ago • 1 comments

I've ported Chatterbox to MLX in Python, and it's working well. I haven't uploaded the weights to Hugging Face yet, in case any adjustments need to be made.

The 4-bit quantized model is about half as large and produces good results.

I was able to reuse and extend the existing S3 tokenizer.

You'll need to provide a short (5- to 10-second) sample recording of a voice to generate speech.

Convert weights and save locally

# Full precision (~3GB)
python mlx_audio/tts/models/chatterbox/scripts/convert_chatterbox.py -o ./Chatterbox-TTS-fp16

# 4-bit quantized (~1.6GB, quantizes T3 backbone only)
python mlx_audio/tts/models/chatterbox/scripts/convert_chatterbox.py -o ./Chatterbox-TTS-4bit --quantize

Generate speech with reference audio (voice cloning)

python -m mlx_audio.tts.generate \
  --model ./Chatterbox-TTS-4bit \
  --text "Hello, this is my cloned voice." \
  --ref_audio sample.wav \
  --play

DePasqualeOrg avatar Nov 30 '25 21:11 DePasqualeOrg

The model is now available in the MLX Community on Hugging Face:

https://huggingface.co/mlx-community/Chatterbox-TTS-fp16 https://huggingface.co/mlx-community/Chatterbox-TTS-8bit https://huggingface.co/mlx-community/Chatterbox-TTS-4bit

You can try it like this with this branch of mlx-audio:

mlx_audio.tts --model mlx-community/Chatterbox-TTS-4bit --text "Hello, this is Chatterbox on MLX!" --ref_audio reference.wav --ref_text "."

DePasqualeOrg avatar Dec 01 '25 17:12 DePasqualeOrg

I'm closing this in favor of further development on my own fork of this repo.

DePasqualeOrg avatar Dec 04 '25 15:12 DePasqualeOrg