mlx-audio icon indicating copy to clipboard operation
mlx-audio copied to clipboard

Fixes and refactoring of Swift library and demo app

Open DePasqualeOrg opened this issue 1 month ago • 8 comments

I've fixed and refactored the Swift parts of this repo:

  • Fixed Orpheus (now ~2.4 RTF on M3)
  • Added OuteTTS (~4.3 RTF on M3)
  • Minor fixes for Kokoro and Marvis
  • Fixed several crashes
  • Fixed MLX usage
  • Consolidated iOS and macOS apps into one multi-platform app
  • Cleaned up the UI
  • Used the latest SwiftUI patterns
  • Reduced code duplication
  • Loading espeak-ng as a package instead of bundling with the package
  • Loading model and tokenizer files from Hugging Face instead of bundling with the package
  • Separated library files for later publication as package
  • Migrated to Swift 6.2 using the latest concurrency patterns for thread safety

The final step will be to move the Swift parts into one or more separate repos. I would suggest separate repos for the library (files currently in mlx_audio_swift/tts/MLXAudio as well as Package.swift) and the demo app, so that the demo app can import the Swift package from GitHub.

For LLMs in Swift we currently have a repo called mlx-swift-lm. Following this pattern, we could name the Swift package repo mlx-swift-audio.

@Blaizzy, we could also download the Kokoro voices from Hugging Face instead of bundling these heavy JSON files if they are uploaded as .safetensors instead of Pickle files.

DePasqualeOrg avatar Nov 25 '25 21:11 DePasqualeOrg

Well, awesome work! That's a big PR to review

Loading espeak-ng as a package instead of bundling with the package

I have not gone through the changes but we want to move espeak-ng here: https://github.com/Blaizzy/EspeakNG-Swift Reason is because of its licensing and separating it from Marvis. So developers can easily use Marvis without having to use Kokoro or its dependencies

rudrankriyam avatar Nov 30 '25 04:11 rudrankriyam

The espeak-ng organization already has this Swift package, which I'm using in this PR: https://github.com/espeak-ng/espeak-ng-spm

DePasqualeOrg avatar Nov 30 '25 13:11 DePasqualeOrg

Well done @DePasqualeOrg!

@Blaizzy, we could also download the Kokoro voices from Hugging Face instead of bundling these heavy JSON files if they are uploaded as .safetensors instead of Pickle files.

I agree, this makes sense, you can implement it 👍🏾

Blaizzy avatar Dec 01 '25 15:12 Blaizzy

For LLMs in Swift we currently have a repo called mlx-swift-lm. Following this pattern, we could name the Swift package repo mlx-swift-audio.

Interesting proposal. I don't have strong ideas between mlx-audio-swift or mlx-swift-audio. Either work, but the former seems better from a discoverability point.

Blaizzy avatar Dec 01 '25 15:12 Blaizzy

The espeak-ng organization already has this Swift package, which I'm using in this PR: https://github.com/espeak-ng/espeak-ng-spm

@rudrankriyam what are your thoughts on this?

Blaizzy avatar Dec 01 '25 15:12 Blaizzy

Well done @DePasqualeOrg!

@Blaizzy, we could also download the Kokoro voices from Hugging Face instead of bundling these heavy JSON files if they are uploaded as .safetensors instead of Pickle files.

I agree, this makes sense, you can implement it 👍🏾

How would you like to handle the Hugging Face repo with the voices? They're currently in Pickle format, which is not technically safe. For Swift we need .safetensors files.

DePasqualeOrg avatar Dec 01 '25 15:12 DePasqualeOrg

@rudrankriyam could you handle the voices? If you come across any issues let me know.

Blaizzy avatar Dec 01 '25 15:12 Blaizzy

@Blaizzy, I added the voices to the Hugging Face repo in .safetensors format here: https://huggingface.co/mlx-community/Kokoro-82M-bf16/discussions/1

This will allow them to be downloaded in the Swift app instead of bundling converted files.

DePasqualeOrg avatar Dec 01 '25 15:12 DePasqualeOrg

Here's what the multi-platform app currently looks like on macOS and iOS:

Screenshot 2025-12-02 at 16 48 48 Screenshot 2025-12-02 at 16 52 29 Screenshot 2025-12-02 at 16 53 08

DePasqualeOrg avatar Dec 02 '25 15:12 DePasqualeOrg

The CI build test is failing because I've used some newer Swift syntax that requires iOS 18.4/macOS 15.4 or newer (specifically, Atomic and isolated deinit). These help a lot with resolving concurrency issues.

What do the other maintainers of this repo think: Is it acceptable to require at least last year's versions of iOS and macOS to run this library? By now around 95% of users are running compatible OS versions. My preference is to prioritize code ergonomics rather than supporting old OS versions that a diminishing fraction of users will be running.

If you're okay with this, we should update the CI settings accordingly.

Cc @Blaizzy @lucasnewman @rudrankriyam

DePasqualeOrg avatar Dec 02 '25 16:12 DePasqualeOrg

I'm planning to do some extensive work on Swift MLX audio tooling and apps over the coming months, which I'll continue in my own repo: https://github.com/DePasqualeOrg/mlx-swift-audio

I've preserved the commit history from this repo for the relevant files there. If you're interested in contributing, let me know so that we can coordinate our efforts.

DePasqualeOrg avatar Dec 03 '25 09:12 DePasqualeOrg