whisper.cpp Reorganize POWER9 SIMD code

Reorganize POWER9 SIMD code

Open fitzsim opened this issue 2 years ago • 0 comments

I could not eliminate the separate index argument in the f16 load and store macros, so this patch set needs testing on other architectures.

The existing GGML_F32x4_REDUCE macro performs as well as the implementation in #366 so I used the existing one.

When I test with the F32 model, ggml_vec_dot_f16 and ggml_vec_mad_f16 are still being called. Is that expected?

Jan 04 '23 08:01 fitzsim