mlx-whisper OOM error on files > 1GB
When I try to transcribe large files, mlx-whisper is consistently crashing with kIOGPUCommandBufferCallbackErrorOutOfMemory. Do you have any advice as to what flags to use to assist with processing larger files? I've tried different models and specifying the language with no difference in outcome.
(.venv) rparrish@oracle absrefined % python --version
Python 3.11.12
(.venv) rparrish@oracle absrefined % uv pip list|grep mlx
mlx 0.25.1
mlx-whisper 0.4.2
(.venv) rparrish@oracle absrefined % ls -l temp
total 6269632
-rw-r--r--@ 1 rparrish staff 1202700876 May 4 17:05 1f7afc6c-246e-4d6c-943b-0223cc4f27e5_full.m4a
-rw-r--r--@ 1 rparrish staff 1133814881 May 4 21:25 90ce63ba-2de8-4ab5-8fc4-5e367dad52df_full.m4a
-rw-r--r--@ 1 rparrish staff 527000201 May 4 18:47 da1dfe53-2846-45a5-ba7f-61ef08221d5f_full.m4a
-rw-r--r--@ 1 rparrish staff 7272845 May 4 18:52 da1dfe53-2846-45a5-ba7f-61ef08221d5f_full_audio.jsonl
-rw-r--r--@ 1 rparrish staff 316813460 May 4 17:26 fba0c82e-22a4-443d-9fa4-6b7da7548f14_full.m4a
-rw-r--r--@ 1 rparrish staff 4435343 May 4 17:30 fba0c82e-22a4-443d-9fa4-6b7da7548f14_full_audio.jsonl
(.venv) rparrish@oracle absrefined % mlx_whisper temp/1f7afc6c-246e-4d6c-943b-0223cc4f27e5_full.m4a
Args: {'audio': ['temp/1f7afc6c-246e-4d6c-943b-0223cc4f27e5_full.m4a'], 'model': 'mlx-community/whisper-tiny', 'output_name': None, 'output_dir': '.', 'output_format': 'txt', 'verbose': True, 'task': 'transcribe', 'language': None, 'temperature': 0, 'best_of': 5, 'patience': None, 'length_penalty': None, 'suppress_tokens': '-1', 'initial_prompt': None, 'condition_on_previous_text': True, 'fp16': True, 'compression_ratio_threshold': 2.4, 'logprob_threshold': -1.0, 'no_speech_threshold': 0.6, 'word_timestamps': False, 'prepend_punctuations': '"\'“¿([{-', 'append_punctuations': '"\'.。,,!!??::”)]}、', 'highlight_words': False, 'max_line_width': None, 'max_line_count': None, 'max_words_per_line': None, 'hallucination_silence_threshold': None, 'clip_timestamps': '0'}
Fetching 4 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 93206.76it/s]
Detecting language using up to the first 30 seconds. Use the `language` decoding option to specify the language
libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
zsh: abort mlx_whisper temp/1f7afc6c-246e-4d6c-943b-0223cc4f27e5_full.m4a
/Users/rparrish/.local/share/uv/python/cpython-3.11.12-macos-aarch64-none/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
^C%
The 500MB file shown in the directory will transcribe without issue, with only a moderate memory spike before processing.
How long are the files that cause issues? It looks like a 1GB of audio? What's the sample rate / duration?
You can do ffmpeg -i 1f7afc6c-246e-4d6c-943b-0223cc4f27e5_full.m4a and share it here?
I think the problem is computing the log mel spectrograms is pretty memory intensive when the audio gets long.
It should be doable to batch that computation.
It's a 21 hour audiobook, so pretty large. It does work when I break it up into 500Mb chunks, I was just under the (perhaps misguided) impression that whisper was doing the chunking in 30 second blocks already, as this does run fine when I use open-ai/whisper, abet slowly on CPU with minimal RAM usage.
➜ Downloads ffmpeg -i 1f7afc6c-246e-4d6c-943b-0223cc4f27e5_full.m4a
ffmpeg version 7.1.1 Copyright (c) 2000-2025 the FFmpeg developers
built with Apple clang version 16.0.0 (clang-1600.0.26.6)
configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/7.1.1_2 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags='-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libharfbuzz --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-audiotoolbox --enable-neon
libavutil 59. 39.100 / 59. 39.100
libavcodec 61. 19.101 / 61. 19.101
libavformat 61. 7.100 / 61. 7.100
libavdevice 61. 3.100 / 61. 3.100
libavfilter 10. 4.100 / 10. 4.100
libswscale 8. 3.100 / 8. 3.100
libswresample 5. 3.100 / 5. 3.100
libpostproc 58. 3.100 / 58. 3.100
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x131605950] stream 0, timescale not set
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1f7afc6c-246e-4d6c-943b-0223cc4f27e5_full.m4a':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: iso2mp41M4A M4B
creation_time : 2024-02-27T23:08:14.000000Z
genre : Fantasy, Space Opera
title : Dune
artist : Frank Herbert
album_artist : Frank Herbert
album : Dune
comment : Here is the novel that will be forever considered a triumph of the imagination. Set on the desert planet Arrakis, \nDune is the story of the boy Paul Atreides, who would become the mysterious man known as Maud'dib. He would avenge the traitorous plot agai
copyright : ©1965 Frank Herbert ℗2007 Audio Renaissance, a division of Holtzbrinck Publishers LLC
date : 2006
composer : Scott Brick, Orlagh Cassidy, Euan Morton, Simon Vance, Ilyana Kadushin, Byron Jennings, David R. Gordon, Jason Culp, Kent Broadhurst, Oliver Wyman, Patricia Kilgarriff, Scott Sowers
PUBLISHER : Macmillan Audio
LANGUAGE : English
AUDIBLE_ASIN : B002V1OF70
SERIES : Dune
PART : 1
Duration: 21:02:06.68, start: 0.000000, bitrate: 127 kb/s
Chapters:
Chapter #0:0: start 0.000000, end 57.000000
Metadata:
title : Opening Credits
Chapter #0:1: start 57.000000, end 115.200000
Metadata:
title : Book One: Dune
Chapter #0:2: start 115.200000, end 1791.228980
Metadata:
title : Chapter 1
Chapter #0:3: start 1791.228980, end 3044.599977
Metadata:
title : Chapter 2
Chapter #0:4: start 3044.599977, end 4040.828957
Metadata:
title : Chapter 3
Chapter #0:5: start 4040.828957, end 5426.828957
Metadata:
title : Chapter 4
Chapter #0:6: start 5426.828957, end 6029.278957
Metadata:
title : Chapter 5
Chapter #0:7: start 6029.278957, end 6967.258957
Metadata:
title : Chapter 6
Chapter #0:8: start 6967.258957, end 8584.276939
Metadata:
title : Chapter 7
Chapter #0:9: start 8584.276939, end 9890.276939
Metadata:
title : Chapter 8
Chapter #0:10: start 9890.276939, end 10470.276939
Metadata:
title : Chapter 9
Chapter #0:11: start 10470.276939, end 11704.276939
Metadata:
title : Chapter 10
Chapter #0:12: start 11704.276939, end 12469.276939
Metadata:
title : Chapter 11
Chapter #0:13: start 12469.276939, end 14789.369932
Metadata:
title : Chapter 12
Chapter #0:14: start 14789.369932, end 15531.869932
Metadata:
title : Chapter 13
Chapter #0:15: start 15531.869932, end 16026.869932
Metadata:
title : Chapter 14
Chapter #0:16: start 16026.869932, end 19187.135918
Metadata:
title : Chapter 15
Chapter #0:17: start 19187.135918, end 22429.405918
Metadata:
title : Chapter 16
Chapter #0:18: start 22429.405918, end 24278.405918
Metadata:
title : Chapter 17
Chapter #0:19: start 24278.405918, end 25034.405918
Metadata:
title : Chapter 18
Chapter #0:20: start 25034.405918, end 26578.918912
Metadata:
title : Chapter 19
Chapter #0:21: start 26578.918912, end 26974.918912
Metadata:
title : Chapter 20
Chapter #0:22: start 26974.918912, end 28985.918912
Metadata:
title : Chapter 21
Chapter #0:23: start 28985.918912, end 31186.713900
Metadata:
title : Chapter 22
Chapter #0:24: start 31186.713900, end 31192.713900
Metadata:
title : Book Two: Muad’Dib
Chapter #0:25: start 31192.713900, end 31900.713900
Metadata:
title : Chapter 23
Chapter #0:26: start 31900.713900, end 33575.713900
Metadata:
title : Chapter 24
Chapter #0:27: start 33575.713900, end 35465.301882
Metadata:
title : Chapter 25
Chapter #0:28: start 35465.301882, end 36966.084875
Metadata:
title : Chapter 26
Chapter #0:29: start 36966.084875, end 39195.084875
Metadata:
title : Chapter 27
Chapter #0:30: start 39195.084875, end 40109.954875
Metadata:
title : Chapter 28
Chapter #0:31: start 40109.954875, end 41569.954875
Metadata:
title : Chapter 29
Chapter #0:32: start 41569.954875, end 42779.954875
Metadata:
title : Chapter 30
Chapter #0:33: start 42779.954875, end 44512.091859
Metadata:
title : Chapter 31
Chapter #0:34: start 44512.091859, end 45852.091859
Metadata:
title : Chapter 32
Chapter #0:35: start 45852.091859, end 47897.091859
Metadata:
title : Chapter 33
Chapter #0:36: start 47897.091859, end 49880.559841
Metadata:
title : Chapter 34
Chapter #0:37: start 49880.559841, end 52717.559841
Metadata:
title : Chapter 35
Chapter #0:38: start 52717.559841, end 54110.665828
Metadata:
title : Chapter 36
Chapter #0:39: start 54110.665828, end 56427.665828
Metadata:
title : Chapter 37
Chapter #0:40: start 56427.665828, end 56432.665828
Metadata:
title : Book Three: The Prophet
Chapter #0:41: start 56432.665828, end 57658.410816
Metadata:
title : Chapter 38
Chapter #0:42: start 57658.410816, end 58710.410816
Metadata:
title : Chapter 39
Chapter #0:43: start 58710.410816, end 60605.165805
Metadata:
title : Chapter 40
Chapter #0:44: start 60605.165805, end 62134.165805
Metadata:
title : Chapter 41
Chapter #0:45: start 62134.165805, end 63206.165805
Metadata:
title : Chapter 42
Chapter #0:46: start 63206.165805, end 65643.945805
Metadata:
title : Chapter 43
Chapter #0:47: start 65643.945805, end 67658.179796
Metadata:
title : Chapter 44
Chapter #0:48: start 67658.179796, end 69265.179796
Metadata:
title : Chapter 45
Chapter #0:49: start 69265.179796, end 70546.179796
Metadata:
title : Chapter 46
Chapter #0:50: start 70546.179796, end 72122.006780
Metadata:
title : Chapter 47
Chapter #0:51: start 72122.006780, end 75677.006780
Metadata:
title : Chapter 48
Chapter #0:52: start 75677.006780, end 75726.648776
Metadata:
title : End Credits
Stream #0:0[0x1](eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 125 kb/s (default)
Metadata:
creation_time : 2024-02-27T23:08:14.000000Z
handler_name : ?Apple Sound Media Handler
vendor_id : [0][0][0][0]
Stream #0:1[0x2](eng): Data: bin_data (text / 0x74786574) (default)
Metadata:
creation_time : 2024-02-27T23:08:15.000000Z
handler_name : ?Apple Text Media Handler
Stream #0:2[0x0]: Video: mjpeg (Progressive), yuvj420p(pc, bt470bg/unknown/unknown), 2400x2400 [SAR 1:1 DAR 1:1], 90k tbr, 90k tbn (attached pic)
At least one output file must be specified