mlx-examples icon indicating copy to clipboard operation
mlx-examples copied to clipboard

mlx-whisper OOM error on files > 1GB

Open stickystyle opened this issue 7 months ago • 4 comments

When I try to transcribe large files, mlx-whisper is consistently crashing with kIOGPUCommandBufferCallbackErrorOutOfMemory. Do you have any advice as to what flags to use to assist with processing larger files? I've tried different models and specifying the language with no difference in outcome.

(.venv) rparrish@oracle absrefined % python --version
Python 3.11.12
(.venv) rparrish@oracle absrefined % uv pip list|grep mlx
mlx                0.25.1
mlx-whisper        0.4.2
(.venv) rparrish@oracle absrefined % ls -l temp
total 6269632
-rw-r--r--@ 1 rparrish  staff  1202700876 May  4 17:05 1f7afc6c-246e-4d6c-943b-0223cc4f27e5_full.m4a
-rw-r--r--@ 1 rparrish  staff  1133814881 May  4 21:25 90ce63ba-2de8-4ab5-8fc4-5e367dad52df_full.m4a
-rw-r--r--@ 1 rparrish  staff   527000201 May  4 18:47 da1dfe53-2846-45a5-ba7f-61ef08221d5f_full.m4a
-rw-r--r--@ 1 rparrish  staff     7272845 May  4 18:52 da1dfe53-2846-45a5-ba7f-61ef08221d5f_full_audio.jsonl
-rw-r--r--@ 1 rparrish  staff   316813460 May  4 17:26 fba0c82e-22a4-443d-9fa4-6b7da7548f14_full.m4a
-rw-r--r--@ 1 rparrish  staff     4435343 May  4 17:30 fba0c82e-22a4-443d-9fa4-6b7da7548f14_full_audio.jsonl
(.venv) rparrish@oracle absrefined % mlx_whisper temp/1f7afc6c-246e-4d6c-943b-0223cc4f27e5_full.m4a
Args: {'audio': ['temp/1f7afc6c-246e-4d6c-943b-0223cc4f27e5_full.m4a'], 'model': 'mlx-community/whisper-tiny', 'output_name': None, 'output_dir': '.', 'output_format': 'txt', 'verbose': True, 'task': 'transcribe', 'language': None, 'temperature': 0, 'best_of': 5, 'patience': None, 'length_penalty': None, 'suppress_tokens': '-1', 'initial_prompt': None, 'condition_on_previous_text': True, 'fp16': True, 'compression_ratio_threshold': 2.4, 'logprob_threshold': -1.0, 'no_speech_threshold': 0.6, 'word_timestamps': False, 'prepend_punctuations': '"\'“¿([{-', 'append_punctuations': '"\'.。,,!!??::”)]}、', 'highlight_words': False, 'max_line_width': None, 'max_line_count': None, 'max_words_per_line': None, 'hallucination_silence_threshold': None, 'clip_timestamps': '0'}
Fetching 4 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 93206.76it/s]
Detecting language using up to the first 30 seconds. Use the `language` decoding option to specify the language
libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
zsh: abort      mlx_whisper temp/1f7afc6c-246e-4d6c-943b-0223cc4f27e5_full.m4a
/Users/rparrish/.local/share/uv/python/cpython-3.11.12-macos-aarch64-none/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
^C%

Image

The 500MB file shown in the directory will transcribe without issue, with only a moderate memory spike before processing.

Image

stickystyle avatar May 05 '25 13:05 stickystyle

How long are the files that cause issues? It looks like a 1GB of audio? What's the sample rate / duration?

awni avatar May 08 '25 14:05 awni

You can do ffmpeg -i 1f7afc6c-246e-4d6c-943b-0223cc4f27e5_full.m4a and share it here?

awni avatar May 08 '25 14:05 awni

I think the problem is computing the log mel spectrograms is pretty memory intensive when the audio gets long.

It should be doable to batch that computation.

awni avatar May 08 '25 15:05 awni

It's a 21 hour audiobook, so pretty large. It does work when I break it up into 500Mb chunks, I was just under the (perhaps misguided) impression that whisper was doing the chunking in 30 second blocks already, as this does run fine when I use open-ai/whisper, abet slowly on CPU with minimal RAM usage.

➜  Downloads ffmpeg -i 1f7afc6c-246e-4d6c-943b-0223cc4f27e5_full.m4a
ffmpeg version 7.1.1 Copyright (c) 2000-2025 the FFmpeg developers
  built with Apple clang version 16.0.0 (clang-1600.0.26.6)
  configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/7.1.1_2 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags='-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libharfbuzz --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-audiotoolbox --enable-neon
  libavutil      59. 39.100 / 59. 39.100
  libavcodec     61. 19.101 / 61. 19.101
  libavformat    61.  7.100 / 61.  7.100
  libavdevice    61.  3.100 / 61.  3.100
  libavfilter    10.  4.100 / 10.  4.100
  libswscale      8.  3.100 /  8.  3.100
  libswresample   5.  3.100 /  5.  3.100
  libpostproc    58.  3.100 / 58.  3.100
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x131605950] stream 0, timescale not set
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1f7afc6c-246e-4d6c-943b-0223cc4f27e5_full.m4a':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: iso2mp41M4A M4B
    creation_time   : 2024-02-27T23:08:14.000000Z
    genre           : Fantasy, Space Opera
    title           : Dune
    artist          : Frank Herbert
    album_artist    : Frank Herbert
    album           : Dune
    comment         : Here is the novel that will be forever considered a triumph of the imagination. Set on the desert planet Arrakis, \nDune is the story of the boy Paul Atreides, who would become the mysterious man known as Maud'dib. He would avenge the traitorous plot agai
    copyright       : ©1965  Frank Herbert ℗2007  Audio Renaissance, a division of Holtzbrinck Publishers LLC
    date            : 2006
    composer        : Scott Brick, Orlagh Cassidy, Euan Morton, Simon Vance, Ilyana Kadushin, Byron Jennings, David R. Gordon, Jason Culp, Kent Broadhurst, Oliver Wyman, Patricia Kilgarriff, Scott Sowers
    PUBLISHER       : Macmillan Audio
    LANGUAGE        : English
    AUDIBLE_ASIN    : B002V1OF70
    SERIES          : Dune
    PART            : 1
  Duration: 21:02:06.68, start: 0.000000, bitrate: 127 kb/s
  Chapters:
    Chapter #0:0: start 0.000000, end 57.000000
      Metadata:
        title           : Opening Credits
    Chapter #0:1: start 57.000000, end 115.200000
      Metadata:
        title           : Book One: Dune
    Chapter #0:2: start 115.200000, end 1791.228980
      Metadata:
        title           : Chapter 1
    Chapter #0:3: start 1791.228980, end 3044.599977
      Metadata:
        title           : Chapter 2
    Chapter #0:4: start 3044.599977, end 4040.828957
      Metadata:
        title           : Chapter 3
    Chapter #0:5: start 4040.828957, end 5426.828957
      Metadata:
        title           : Chapter 4
    Chapter #0:6: start 5426.828957, end 6029.278957
      Metadata:
        title           : Chapter 5
    Chapter #0:7: start 6029.278957, end 6967.258957
      Metadata:
        title           : Chapter 6
    Chapter #0:8: start 6967.258957, end 8584.276939
      Metadata:
        title           : Chapter 7
    Chapter #0:9: start 8584.276939, end 9890.276939
      Metadata:
        title           : Chapter 8
    Chapter #0:10: start 9890.276939, end 10470.276939
      Metadata:
        title           : Chapter 9
    Chapter #0:11: start 10470.276939, end 11704.276939
      Metadata:
        title           : Chapter 10
    Chapter #0:12: start 11704.276939, end 12469.276939
      Metadata:
        title           : Chapter 11
    Chapter #0:13: start 12469.276939, end 14789.369932
      Metadata:
        title           : Chapter 12
    Chapter #0:14: start 14789.369932, end 15531.869932
      Metadata:
        title           : Chapter 13
    Chapter #0:15: start 15531.869932, end 16026.869932
      Metadata:
        title           : Chapter 14
    Chapter #0:16: start 16026.869932, end 19187.135918
      Metadata:
        title           : Chapter 15
    Chapter #0:17: start 19187.135918, end 22429.405918
      Metadata:
        title           : Chapter 16
    Chapter #0:18: start 22429.405918, end 24278.405918
      Metadata:
        title           : Chapter 17
    Chapter #0:19: start 24278.405918, end 25034.405918
      Metadata:
        title           : Chapter 18
    Chapter #0:20: start 25034.405918, end 26578.918912
      Metadata:
        title           : Chapter 19
    Chapter #0:21: start 26578.918912, end 26974.918912
      Metadata:
        title           : Chapter 20
    Chapter #0:22: start 26974.918912, end 28985.918912
      Metadata:
        title           : Chapter 21
    Chapter #0:23: start 28985.918912, end 31186.713900
      Metadata:
        title           : Chapter 22
    Chapter #0:24: start 31186.713900, end 31192.713900
      Metadata:
        title           : Book Two: Muad’Dib
    Chapter #0:25: start 31192.713900, end 31900.713900
      Metadata:
        title           : Chapter 23
    Chapter #0:26: start 31900.713900, end 33575.713900
      Metadata:
        title           : Chapter 24
    Chapter #0:27: start 33575.713900, end 35465.301882
      Metadata:
        title           : Chapter 25
    Chapter #0:28: start 35465.301882, end 36966.084875
      Metadata:
        title           : Chapter 26
    Chapter #0:29: start 36966.084875, end 39195.084875
      Metadata:
        title           : Chapter 27
    Chapter #0:30: start 39195.084875, end 40109.954875
      Metadata:
        title           : Chapter 28
    Chapter #0:31: start 40109.954875, end 41569.954875
      Metadata:
        title           : Chapter 29
    Chapter #0:32: start 41569.954875, end 42779.954875
      Metadata:
        title           : Chapter 30
    Chapter #0:33: start 42779.954875, end 44512.091859
      Metadata:
        title           : Chapter 31
    Chapter #0:34: start 44512.091859, end 45852.091859
      Metadata:
        title           : Chapter 32
    Chapter #0:35: start 45852.091859, end 47897.091859
      Metadata:
        title           : Chapter 33
    Chapter #0:36: start 47897.091859, end 49880.559841
      Metadata:
        title           : Chapter 34
    Chapter #0:37: start 49880.559841, end 52717.559841
      Metadata:
        title           : Chapter 35
    Chapter #0:38: start 52717.559841, end 54110.665828
      Metadata:
        title           : Chapter 36
    Chapter #0:39: start 54110.665828, end 56427.665828
      Metadata:
        title           : Chapter 37
    Chapter #0:40: start 56427.665828, end 56432.665828
      Metadata:
        title           : Book Three: The Prophet
    Chapter #0:41: start 56432.665828, end 57658.410816
      Metadata:
        title           : Chapter 38
    Chapter #0:42: start 57658.410816, end 58710.410816
      Metadata:
        title           : Chapter 39
    Chapter #0:43: start 58710.410816, end 60605.165805
      Metadata:
        title           : Chapter 40
    Chapter #0:44: start 60605.165805, end 62134.165805
      Metadata:
        title           : Chapter 41
    Chapter #0:45: start 62134.165805, end 63206.165805
      Metadata:
        title           : Chapter 42
    Chapter #0:46: start 63206.165805, end 65643.945805
      Metadata:
        title           : Chapter 43
    Chapter #0:47: start 65643.945805, end 67658.179796
      Metadata:
        title           : Chapter 44
    Chapter #0:48: start 67658.179796, end 69265.179796
      Metadata:
        title           : Chapter 45
    Chapter #0:49: start 69265.179796, end 70546.179796
      Metadata:
        title           : Chapter 46
    Chapter #0:50: start 70546.179796, end 72122.006780
      Metadata:
        title           : Chapter 47
    Chapter #0:51: start 72122.006780, end 75677.006780
      Metadata:
        title           : Chapter 48
    Chapter #0:52: start 75677.006780, end 75726.648776
      Metadata:
        title           : End Credits
  Stream #0:0[0x1](eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 125 kb/s (default)
      Metadata:
        creation_time   : 2024-02-27T23:08:14.000000Z
        handler_name    : ?Apple Sound Media Handler
        vendor_id       : [0][0][0][0]
  Stream #0:1[0x2](eng): Data: bin_data (text / 0x74786574) (default)
      Metadata:
        creation_time   : 2024-02-27T23:08:15.000000Z
        handler_name    : ?Apple Text Media Handler
  Stream #0:2[0x0]: Video: mjpeg (Progressive), yuvj420p(pc, bt470bg/unknown/unknown), 2400x2400 [SAR 1:1 DAR 1:1], 90k tbr, 90k tbn (attached pic)
At least one output file must be specified

stickystyle avatar May 08 '25 20:05 stickystyle