whisper.cpp
whisper.cpp copied to clipboard
Reorganize POWER9 SIMD code
I could not eliminate the separate index argument in the f16 load and store macros, so this patch set needs testing on other architectures.
The existing GGML_F32x4_REDUCE macro performs as well as the implementation in #366 so I used the existing one.
When I test with the F32 model, ggml_vec_dot_f16 and ggml_vec_mad_f16 are still being called. Is that expected?