whisper.cpp
whisper.cpp copied to clipboard
Add SSE3 and Imath support
Adds SSE3 support for SIMD and support for using Imath for fp16-fp32 conversions. Imath can be faster on systems where whisper.cpp doesn't already have a native method for doing the conversion as it uses a lookup table, leading to an ~3.5x speed increase on my system.
Drafting as I am unsure what value to put for GGML_F32_STEP and GGML_F16_STEP - guidance on this would be appreciated.
A quick test seems to show that 32 leads to better performance than 16 or 64
A quick test seems to show that 32 leads to better performance than 16 or 64
Yes, that's what I do - trial and error to find the best value :)
This is a great contribution.
Before merging, I would like to avoid the Imath dependency.
We can simply generate a lookup table in ggml.c and use it instead of relying on Imath.
Take a look at the existing lookup tables for gelu and exp:
https://github.com/ggerganov/whisper.cpp/blob/a0d4f8e65ca03247ef385552a34be11ef6f1a871/ggml.c#L246-L250
I'm very curious to see if this F16 LUT will speed-up the WASM examples, because WASM does not have an intrinsic for FP16 <-> FP32 conversion so it fallbacks to the naive conversion method.
Leaving as a draft for now as I want to see if I can get rid of some of the memcpy calls in the ggml_lookup_fp16_to_fo32 function.
A review would be appreciated as I am almost done with this though.
Turns out the memcpy calls are optimised out by the compiler anyways :) Marking this as ready.
@abitofevrything Good news! As expected, the lookup table improves the WASM performance. On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome