whisper.cpp
whisper.cpp copied to clipboard
PPC64 big-endian support
For the sake of completion of the POWER port, it would be nice if big endian worked.
I made some progress on the primitives involved, but I could use some overall guidance. Do you think it is a worthwhile pursuit, or are there too many little-endian assumptions in the code? It would be fine if a model repacking were required, with an option to use fout.write(struct.pack(">...
everywhere in convert-pt-to-ggml.py
, for example. And byte reversals seem inexpensive on POWER SIMD.
I think the only place that has little-endian assumptions is in the FP16 <-> FP32 naive conversion.
What would change in the POWER port if it supported big-endian compared to the existing port? I don't see any of the intrinsics to depend on the endianness.
To get the model loading I modified read_safe with:
if constexpr (std::endian::native == std::endian::big) {
dest = std::byteswap(dest);
}
Then, running with -t 1
for consistency, in the first call to ggml_vec_dot_f16
I print out the register contents as ggml_fp16_t
. Here is the output on little-endian:
0: sx[0]: 2875 a902 a86b 2e3b 1f37 a8ea b122 ae1d
0: sy[0]: 0 0 0 0 0 0 0 0
0: sx[1]: 1f37 a8ea b122 ae1d 2f3d 3254 2a08 ae8c
0: sy[1]: 0 0 0 0 0 0 0 0
0: sx[2]: 2f3d 3254 2a08 ae8c ae59 2720 2c7e 205b
0: sy[2]: 0 0 0 0 0 0 0 0
0: sx[3]: ae59 2720 2c7e 205b a8bd a50d 234b 2456
0: sy[3]: 0 0 0 0 0 0 0 0
0: sx[4]: a8bd a50d 234b 2456 9cc1 a40c 910d 1998
0: sy[4]: 0 0 0 0 0 0 0 0
0: sx[5]: 9cc1 a40c 910d 1998 1a9e 925b 1166 1992
0: sy[5]: 0 0 0 0 0 0 0 0
0: sx[6]: 1a9e 925b 1166 1992 1476 1e74 a200 9de5
0: sy[6]: 0 0 0 0 0 0 0 0
0: sx[7]: 1476 1e74 a200 9de5 1e90 9d5e a0b1 9e2d
0: sy[7]: 0 0 0 0 0 0 0 0
32: sx[0]: 1e90 9d5e a0b1 9e2d 9318 1de6 1da4 1e1e
32: sy[0]: 0 0 0 0 0 0 0 0
32: sx[1]: 9318 1de6 1da4 1e1e 99dc a0ca 193f 1dc3
32: sy[1]: 0 0 0 0 0 0 0 0
32: sx[2]: 99dc a0ca 193f 1dc3 1d48 965c 9505 98bf
32: sy[2]: 0 0 0 0 0 0 0 0
32: sx[3]: 1d48 965c 9505 98bf 9c49 9d61 9688 177c
32: sy[3]: 0 0 0 0 0 0 0 0
32: sx[4]: 9c49 9d61 9688 177c 1e07 8c58 8cab 1a06
32: sy[4]: 0 0 0 0 0 0 0 0
32: sx[5]: 1e07 8c58 8cab 1a06 98f0 99ed 1197 8e5f
32: sy[5]: 0 0 0 0 0 0 0 0
32: sx[6]: 98f0 99ed 1197 8e5f 1ac6 1b7b 1a2f 1d07
32: sy[6]: 0 0 0 0 0 0 0 0
32: sx[7]: 1ac6 1b7b 1a2f 1d07 c48 1207 16a4 8ec3
32: sy[7]: 0 0 0 0 0 0 0 0
64: sx[0]: c48 1207 16a4 8ec3 1788 846c 161e 12c8
64: sy[0]: 0 0 0 0 0 0 0 0
64: sx[1]: 1788 846c 161e 12c8 186f 1975 1985 bc8
64: sy[1]: 0 0 0 0 0 0 0 0
64: sx[2]: 186f 1975 1985 bc8 b03 12a3 1c2e 9958
64: sy[2]: 0 0 0 0 0 0 0 0
64: sx[3]: b03 12a3 1c2e 9958 0 0 0 0
64: sy[3]: 0 0 0 0 0 0 0 0
64: sx[4]: 0 0 0 0 0 0 0 0
64: sy[4]: 0 0 0 0 0 0 0 0
64: sx[5]: 0 0 0 0 0 0 0 0
64: sy[5]: 0 0 0 0 0 0 0 0
64: sx[6]: 0 0 0 0 0 0 0 0
64: sy[6]: 0 0 0 0 0 0 0 0
64: sx[7]: 0 0 0 0 2bf7 a9f1 abe1 2e21
64: sy[7]: 0 0 0 0 b778 b778 b778 b778
and on big-endian:
0: sx[0]: 2875 a902 a86b 2e3b 1f37 a8ea b122 ae1d
0: sy[0]: 0 0 0 0 0 0 0 0
0: sx[1]: 1f37 a8ea b122 ae1d 2f3d 3254 2a08 ae8c
0: sy[1]: 0 0 0 0 0 0 0 0
0: sx[2]: 2f3d 3254 2a08 ae8c ae59 2720 2c7e 205b
0: sy[2]: 0 0 0 0 0 0 0 0
0: sx[3]: ae59 2720 2c7e 205b a8bd a50d 234b 2456
0: sy[3]: 0 0 0 0 0 0 0 0
0: sx[4]: a8bd a50d 234b 2456 9cc1 a40c 910d 1998
0: sy[4]: 0 0 0 0 0 0 0 0
0: sx[5]: 9cc1 a40c 910d 1998 1a9e 925b 1166 1992
0: sy[5]: 0 0 0 0 0 0 0 0
0: sx[6]: 1a9e 925b 1166 1992 1476 1e74 a200 9de5
0: sy[6]: 0 0 0 0 0 0 0 0
0: sx[7]: 1476 1e74 a200 9de5 1e90 9d5e a0b1 9e2d
0: sy[7]: 0 0 0 0 0 0 0 0
32: sx[0]: 1e90 9d5e a0b1 9e2d 9318 1de6 1da4 1e1e
32: sy[0]: 0 0 0 0 0 0 0 0
32: sx[1]: 9318 1de6 1da4 1e1e 99dc a0ca 193f 1dc3
32: sy[1]: 0 0 0 0 0 0 0 0
32: sx[2]: 99dc a0ca 193f 1dc3 1d48 965c 9505 98bf
32: sy[2]: 0 0 0 0 0 0 0 0
32: sx[3]: 1d48 965c 9505 98bf 9c49 9d61 9688 177c
32: sy[3]: 0 0 0 0 0 0 0 0
32: sx[4]: 9c49 9d61 9688 177c 1e07 8c58 8cab 1a06
32: sy[4]: 0 0 0 0 0 0 0 0
32: sx[5]: 1e07 8c58 8cab 1a06 98f0 99ed 1197 8e5f
32: sy[5]: 0 0 0 0 0 0 0 0
32: sx[6]: 98f0 99ed 1197 8e5f 1ac6 1b7b 1a2f 1d07
32: sy[6]: 0 0 0 0 0 0 0 0
32: sx[7]: 1ac6 1b7b 1a2f 1d07 c48 1207 16a4 8ec3
32: sy[7]: 0 0 0 0 0 0 0 0
64: sx[0]: c48 1207 16a4 8ec3 1788 846c 161e 12c8
64: sy[0]: 0 0 0 0 0 0 0 0
64: sx[1]: 1788 846c 161e 12c8 186f 1975 1985 bc8
64: sy[1]: 0 0 0 0 0 0 0 0
64: sx[2]: 186f 1975 1985 bc8 b03 12a3 1c2e 9958
64: sy[2]: 0 0 0 0 0 0 0 0
64: sx[3]: b03 12a3 1c2e 9958 0 0 0 0
64: sy[3]: 0 0 0 0 0 0 0 0
64: sx[4]: 0 0 0 0 0 0 0 0
64: sy[4]: 0 0 0 0 0 0 0 0
64: sx[5]: 0 0 0 0 0 0 0 0
64: sy[5]: 0 0 0 0 0 0 0 0
64: sx[6]: 0 0 0 0 0 0 0 0
64: sy[6]: 0 0 0 0 0 0 0 0
64: sx[7]: 0 0 0 0 2bf7 a9f1 abe1 2e21
64: sy[7]: 0 0 0 0 7c 7c 7c 7c
I want to figure out the b778
vs 007c
. The x
inputs seem to just be byteswapped, but the y
inputs don't match. I can do the byteswapping via intrinsics (vec_revb
), but I don't know how to reconcile the different y
inputs.
There are a few things that do not go through read_safe
. For example:
- mel filters: https://github.com/ggerganov/whisper.cpp/blob/52a3e0c92a8be5150d2a59e492b4943ca8a623b0/whisper.cpp#L542-L543
- tensor weights: https://github.com/ggerganov/whisper.cpp/blob/52a3e0c92a8be5150d2a59e492b4943ca8a623b0/whisper.cpp#L1064-L1066
Maybe you have to byte-swap there too.
#398 is where I'm at. Starting with the 32592001th call to ggml_vec_dot_f16
, src1->data is pointing at different data on the big and little endian targets. I'm not sure what next steps to take.
Implemented in #398.