mlx-audio Bug Report: Swift-TTS-iOS Crash Due to Excessive Memory Usage

The app Swift-TTS-iOS crashes during execution due to excessive memory usage. The operating system terminates the app with the following message:

App: Swift-TTS-iOS Device: iPhone 15 OS: iOS (latest at the time of testing)

Temporary Workaround: Limiting GPU memory manually using the MLX API helps mitigate the issue:

MLX.GPU.set(memoryLimit: 500 * 1024 * 1024) // Limit to 500MB or ??

May 22 '25 02:05 thuongvovan

Yeah, we recently fixed memory spikes here #165

Please give it a try and let me know if the issues continue

I'm not sure limiting memory usage is the best approach because certain IOS devices have more resources and the same goes to models of different sizes.

May 24 '25 11:05 Blaizzy

It sometimes still gets killed by iOS's jetsam. With more memory limit how much faster is the generation for you guys? On my iPhone 16 I've tried 500, 1000, 2000, 3000 for MLX.GPU.set(memoryLimit: N * 1024 * 1024). But the generation speed seems to be roughly the same , although the memory may spike to like 1000, 2000, 3000 mb.

On Kokoro, some text that takes 64s for 500mb, takes 62s about for 3000mb. And sometimes the same.

Jul 05 '25 07:07 sickerin

I can confirm that this still seems to be an issue. I'm pretty new to the Swift-TTS Library but I have just been running some tests on an iPhone 17 Pro on iOS 26. The stock Swift-TTS-iOS App works fine for small text snippets (1-2 Paragraphs) but when you paste in an entire article of text it will crash after a short while generating. Watching the Performance Tab in Xcode the highest I saw memory peak is 2.95GB before crashing.

Under the 'actionButtonsView' if I put in MLX.GPU.set(memoryLimit: 500 * 1024 * 1024) it works fine. So the entire Kokoro Code looks as follows:

                        // Prepare text and speaker for Kokoro
                        let speaker = speakerModel.getPrimarySpeaker().first!
                        
                        // Set memory constraints for MLX and start generation
                        MLX.GPU.set(cacheLimit: 20 * 1024 * 1024)
                        MLX.GPU.set(memoryLimit: 500 * 1024 * 1024)
                        kokoroViewModel.say(t, TTSVoice.fromIdentifier(speaker.name) ?? .afHeart, speed: Float(speed))

I still see memory peak at around 1.2GB when this setting is in play but the app no longer crashes. Seems to hover around 480MB for most of the generation.

Sep 27 '25 05:09 bradleyandrew