whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

whisper_backend_init: Metal GPU does not support family 7 - falling back to CPU

Open h9j6k opened this issue 1 year ago • 0 comments

Hello,

I am on MacOS 12 with homebrew installed whisper.cpp v1.6.2 (modified formula, compiled with WHISPER_METAL_EMBED_LIBRARY=1), also I followed https://github.com/orgs/Homebrew/discussions/4292 to add export GGML_METAL_PATH_RESOURCES="$(brew --prefix whisper-cpp)/share/whisper-cpp"

when I ran whisper-cpp -f samples/jfk.wav -m models/ggml-base.en.bin, I saw whisper_backend_init: Metal GPU does not support family 7 - falling back to CPU, which means the gpu is not used in transcribing work.

Here is the verbose,

whisper_init_from_file_with_params_no_state: loading model from './models/ggml-base.en.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_backend_init: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Intel Iris Graphics
ggml_metal_init: picking default device: Intel Iris Graphics
ggml_metal_init: using embedded metal library
ggml_metal_init: GPU name:   Intel Iris Graphics
ggml_metal_init: GPU family: MTLGPUFamilyCommon2 (3002)
ggml_metal_init: simdgroup reduction support   = false
ggml_metal_init: simdgroup matrix mul. support = false
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  =  1610.61 MB
ggml_metal_init: skipping kernel_soft_max_f16                      (not supported)
ggml_metal_init: skipping kernel_soft_max_f16_4                    (not supported)
ggml_metal_init: skipping kernel_soft_max_f32                      (not supported)
ggml_metal_init: skipping kernel_soft_max_f32_4                    (not supported)
ggml_metal_init: skipping kernel_rms_norm                          (not supported)
ggml_metal_init: skipping kernel_group_norm                        (not supported)
ggml_metal_init: skipping kernel_mul_mv_f32_f32                    (not supported)
ggml_metal_init: skipping kernel_mul_mv_f16_f16                    (not supported)
ggml_metal_init: skipping kernel_mul_mv_f16_f32                    (not supported)
ggml_metal_init: skipping kernel_mul_mv_f16_f32_1row               (not supported)
ggml_metal_init: skipping kernel_mul_mv_f16_f32_l4                 (not supported)
ggml_metal_init: skipping kernel_mul_mv_q4_0_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_q4_1_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_q5_0_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_q5_1_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_q8_0_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_q2_K_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_q3_K_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_q4_K_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_q5_K_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_q6_K_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq2_xxs_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq2_xs_f32                 (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq3_xxs_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq3_s_f32                  (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq2_s_f32                  (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq1_s_f32                  (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq1_m_f32                  (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq4_nl_f32                 (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq4_xs_f32                 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_f32_f32                 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_f16_f32                 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q4_0_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q4_1_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q5_0_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q5_1_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q8_0_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q2_K_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q3_K_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q4_K_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q5_K_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q6_K_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq2_xxs_f32             (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq2_xs_f32              (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq3_xxs_f32             (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq3_s_f32               (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq2_s_f32               (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq1_s_f32               (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq1_m_f32               (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq4_nl_f32              (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq4_xs_f32              (not supported)
ggml_metal_init: skipping kernel_mul_mm_f32_f32                    (not supported)
ggml_metal_init: skipping kernel_mul_mm_f16_f32                    (not supported)
ggml_metal_init: skipping kernel_mul_mm_q4_0_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_q4_1_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_q5_0_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_q5_1_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_q8_0_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_q2_K_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_q3_K_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_q4_K_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_q5_K_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_q6_K_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq2_xxs_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq2_xs_f32                 (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq3_xxs_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq3_s_f32                  (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq2_s_f32                  (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq1_s_f32                  (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq1_m_f32                  (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq4_nl_f32                 (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq4_xs_f32                 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_f32_f32                 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_f16_f32                 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q4_0_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q4_1_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q5_0_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q5_1_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q8_0_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q2_K_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q3_K_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q4_K_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q5_K_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q6_K_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq2_xxs_f32             (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq2_xs_f32              (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq3_xxs_f32             (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq3_s_f32               (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq2_s_f32               (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq1_s_f32               (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq1_m_f32               (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq4_nl_f32              (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq4_xs_f32              (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_f16_h64            (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_f16_h80            (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_f16_h96            (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_f16_h112           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_f16_h128           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_f16_h256           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_f16_h128       (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_f16_h256       (not supported)
whisper_backend_init: Metal GPU does not support family 7 - falling back to CPU
ggml_metal_free: deallocating
whisper_model_load:      CPU total size =   147.37 MB
whisper_model_load: model size    =  147.37 MB
whisper_backend_init: using Metal backend
whisper_init_state: kv self size  =   18.87 MB
whisper_init_state: kv cross size =   18.87 MB
whisper_init_state: kv pad  size  =    3.15 MB
whisper_init_state: compute buffer (conv)   =   16.39 MB
whisper_init_state: compute buffer (encode) =  135.14 MB
whisper_init_state: compute buffer (cross)  =    4.78 MB
whisper_init_state: compute buffer (decode) =   96.48 MB

system_info: n_threads = 4 / 4 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 1 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0

main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.


whisper_print_timings:     load time =   371.55 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    34.51 ms
whisper_print_timings:   sample time =   126.26 ms /   131 runs (    0.96 ms per run)
whisper_print_timings:   encode time =  3571.59 ms /     1 runs ( 3571.59 ms per run)
whisper_print_timings:   decode time =    19.55 ms /     2 runs (    9.77 ms per run)
whisper_print_timings:   batchd time =   775.25 ms /   125 runs (    6.20 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time =  4965.58 ms

According to Support for Metal on Mac, iPad, and iPhone, the gpu itself supports up to Common family 2 GPU features,

https://developer.apple.com/documentation/metal/mtlgpufamily/common2

the os MacOS 12 supports up to metal lang ver 2_4

https://developer.apple.com/documentation/metal/mtllanguageversion/mtllanguageversion2_4

Is there any plan for whisper.cpp to expand metal coverage for older macs to use gpu's such as Intel Iris Graphics?

Thanks.

h9j6k avatar Jun 27 '24 02:06 h9j6k