whisper.cpp Add SSE3 and Imath support

Adds SSE3 support for SIMD and support for using Imath for fp16-fp32 conversions. Imath can be faster on systems where whisper.cpp doesn't already have a native method for doing the conversion as it uses a lookup table, leading to an ~3.5x speed increase on my system.

Jan 03 '23 21:01 abitofevrything

Drafting as I am unsure what value to put for GGML_F32_STEP and GGML_F16_STEP - guidance on this would be appreciated.

Jan 03 '23 21:01 abitofevrything

A quick test seems to show that 32 leads to better performance than 16 or 64

Jan 04 '23 10:01 abitofevrything

A quick test seems to show that 32 leads to better performance than 16 or 64

Yes, that's what I do - trial and error to find the best value :)

This is a great contribution. Before merging, I would like to avoid the Imath dependency. We can simply generate a lookup table in ggml.c and use it instead of relying on Imath. Take a look at the existing lookup tables for gelu and exp:

https://github.com/ggerganov/whisper.cpp/blob/a0d4f8e65ca03247ef385552a34be11ef6f1a871/ggml.c#L246-L250

I'm very curious to see if this F16 LUT will speed-up the WASM examples, because WASM does not have an intrinsic for FP16 <-> FP32 conversion so it fallbacks to the naive conversion method.

Jan 05 '23 19:01 ggerganov

Leaving as a draft for now as I want to see if I can get rid of some of the memcpy calls in the ggml_lookup_fp16_to_fo32 function.

A review would be appreciated as I am almost done with this though.

Jan 06 '23 00:01 abitofevrything

Turns out the memcpy calls are optimised out by the compiler anyways :) Marking this as ready.

Jan 06 '23 12:01 abitofevrything

@abitofevrything Good news! As expected, the lookup table improves the WASM performance. On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome

Jan 06 '23 16:01 ggerganov

whisper.cpp whisper.cpp copied to clipboard

Add SSE3 and Imath support

whisper.cpp
whisper.cpp copied to clipboard