whisper_backend_init: Metal GPU does not support family 7 - falling back to CPU
Hello,
I am on MacOS 12 with homebrew installed whisper.cpp v1.6.2 (modified formula, compiled with WHISPER_METAL_EMBED_LIBRARY=1), also I followed https://github.com/orgs/Homebrew/discussions/4292 to add export GGML_METAL_PATH_RESOURCES="$(brew --prefix whisper-cpp)/share/whisper-cpp"
when I ran whisper-cpp -f samples/jfk.wav -m models/ggml-base.en.bin, I saw whisper_backend_init: Metal GPU does not support family 7 - falling back to CPU, which means the gpu is not used in transcribing work.
Here is the verbose,
whisper_init_from_file_with_params_no_state: loading model from './models/ggml-base.en.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
whisper_backend_init: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Intel Iris Graphics
ggml_metal_init: picking default device: Intel Iris Graphics
ggml_metal_init: using embedded metal library
ggml_metal_init: GPU name: Intel Iris Graphics
ggml_metal_init: GPU family: MTLGPUFamilyCommon2 (3002)
ggml_metal_init: simdgroup reduction support = false
ggml_metal_init: simdgroup matrix mul. support = false
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 1610.61 MB
ggml_metal_init: skipping kernel_soft_max_f16 (not supported)
ggml_metal_init: skipping kernel_soft_max_f16_4 (not supported)
ggml_metal_init: skipping kernel_soft_max_f32 (not supported)
ggml_metal_init: skipping kernel_soft_max_f32_4 (not supported)
ggml_metal_init: skipping kernel_rms_norm (not supported)
ggml_metal_init: skipping kernel_group_norm (not supported)
ggml_metal_init: skipping kernel_mul_mv_f32_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_f16_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mv_f16_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_f16_f32_1row (not supported)
ggml_metal_init: skipping kernel_mul_mv_f16_f32_l4 (not supported)
ggml_metal_init: skipping kernel_mul_mv_q4_0_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_q4_1_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_q5_0_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_q5_1_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_q8_0_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_q2_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_q3_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_q4_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_q5_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_q6_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq2_xxs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq2_xs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq3_xxs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq3_s_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq2_s_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq1_s_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq1_m_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq4_nl_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq4_xs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_f32_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_f16_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q4_0_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q4_1_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q5_0_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q5_1_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q8_0_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q2_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q3_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q4_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q5_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q6_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq2_xxs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq2_xs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq3_xxs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq3_s_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq2_s_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq1_s_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq1_m_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq4_nl_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq4_xs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_f32_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_f16_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_q4_0_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_q4_1_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_q5_0_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_q5_1_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_q8_0_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_q2_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_q3_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_q4_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_q5_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_q6_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq2_xxs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq2_xs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq3_xxs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq3_s_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq2_s_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq1_s_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq1_m_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq4_nl_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq4_xs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_f32_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_f16_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q4_0_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q4_1_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q5_0_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q5_1_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q8_0_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q2_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q3_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q4_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q5_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q6_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq2_xxs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq2_xs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq3_xxs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq3_s_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq2_s_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq1_s_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq1_m_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq4_nl_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq4_xs_f32 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_f16_h64 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_f16_h80 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_f16_h96 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_f16_h112 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_f16_h128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_f16_h256 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_f16_h128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_f16_h256 (not supported)
whisper_backend_init: Metal GPU does not support family 7 - falling back to CPU
ggml_metal_free: deallocating
whisper_model_load: CPU total size = 147.37 MB
whisper_model_load: model size = 147.37 MB
whisper_backend_init: using Metal backend
whisper_init_state: kv self size = 18.87 MB
whisper_init_state: kv cross size = 18.87 MB
whisper_init_state: kv pad size = 3.15 MB
whisper_init_state: compute buffer (conv) = 16.39 MB
whisper_init_state: compute buffer (encode) = 135.14 MB
whisper_init_state: compute buffer (cross) = 4.78 MB
whisper_init_state: compute buffer (decode) = 96.48 MB
system_info: n_threads = 4 / 4 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 1 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0
main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...
[00:00:00.000 --> 00:00:11.000] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
whisper_print_timings: load time = 371.55 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 34.51 ms
whisper_print_timings: sample time = 126.26 ms / 131 runs ( 0.96 ms per run)
whisper_print_timings: encode time = 3571.59 ms / 1 runs ( 3571.59 ms per run)
whisper_print_timings: decode time = 19.55 ms / 2 runs ( 9.77 ms per run)
whisper_print_timings: batchd time = 775.25 ms / 125 runs ( 6.20 ms per run)
whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 4965.58 ms
According to Support for Metal on Mac, iPad, and iPhone, the gpu itself supports up to Common family 2 GPU features,
https://developer.apple.com/documentation/metal/mtlgpufamily/common2
the os MacOS 12 supports up to metal lang ver 2_4
https://developer.apple.com/documentation/metal/mtllanguageversion/mtllanguageversion2_4
Is there any plan for whisper.cpp to expand metal coverage for older macs to use gpu's such as Intel Iris Graphics?
Thanks.