mlx-audio icon indicating copy to clipboard operation
mlx-audio copied to clipboard

Add ESpeakNG.xcframework for iOS on Kokoro MLX Audio Swift

Open rudrankriyam opened this issue 9 months ago • 7 comments

Added ESpeakNG.xcframework slice for iOS (real device) based on script from https://github.com/mlalma/kokoro-ios

The framework contains multiple 'slices,' including versions for macOS and iOS. While linking is sufficient for macOS to find and use its slice, iOS has stricter requirements.

The core issue was that the iOS slice of ESpeakNG.xcframework was not being correctly embedded into the Swift-TTS app bundle for the iOS target.

To get it to work:

  • I adjusted the Runpath Search Paths for the iOS target to the standard @executable_path/Frameworks, which tells iOS where to look for embedded frameworks.
  • Then, I move the ESpeakNG.xcframework to the main Frameworks directory and set "Embed & Sign" in the "Frameworks, Libraries, and Embedded Content" section to embed in the iOS app.

This path change helps Xcode process the framework for the iOS target, otherwise running into 'dyld: Library not loaded' error."

What do you think of it? Any better alternative?

rudrankriyam avatar May 11 '25 18:05 rudrankriyam

Your overall approach looks fine, but we'd want to add the source and project file, not the binary framework, and build it as a dependency of the MLXAudio framework (or example app, etc) in Swift.

In a perfect world it would reference the official code as a submodule, but it looks like the project structure differs quite a bit so I'm not sure how much effort that would be.

lucasnewman avatar May 11 '25 20:05 lucasnewman

Thank you very much @rudrankriyam!

I agree with you @lucasnewman but from my initial testing this is the fastest way to unlock IOS.

Since this is only for Kokoro. What do you think we move forward this approach in the meantime whilst we work on a more robust solution?

Blaizzy avatar May 11 '25 20:05 Blaizzy

I"ve been playing with it today. Got Kokoro running onto my ios device with multiple voice, but there are limitations in kokoro-ios. Token limit (around 500), memory issue are the biggest. I've split up long txt's into chunks and serialize the TTS with kokoro. Pronunciation also is an issue.

niklasmato avatar May 11 '25 20:05 niklasmato

@niklasmato yeah for the pronunciation part, it mentions about using the different phonemizer

The project uses eSpeak NG as a phonemizer, which is different from what the original Kokoro TTS uses. This can and will cause differences in the output audio.

rudrankriyam avatar May 11 '25 20:05 rudrankriyam

Thanks @niklasmato this is great feedback!

Could you please try out this PR and open an issue with some examples if you find this the same limitations.

Blaizzy avatar May 11 '25 20:05 Blaizzy

Thank you very much @rudrankriyam!

I agree with you @lucasnewman but from my initial testing this is the fastest way to unlock IOS.

Since this is only for Kokoro. What do you think we move forward this approach in the meantime whilst we work on a more robust solution?

@lucasnewman what do you think?

Blaizzy avatar May 13 '25 20:05 Blaizzy

@rudrankriyam could you resolve the conflicts?

Blaizzy avatar May 13 '25 20:05 Blaizzy

#138 has added the iOS framework so closing this

rudrankriyam avatar May 14 '25 08:05 rudrankriyam