whisper.cpp PPC64 big-endian support

PPC64 big-endian support

Open fitzsim opened this issue 1 year ago • 4 comments

For the sake of completion of the POWER port, it would be nice if big endian worked.

I made some progress on the primitives involved, but I could use some overall guidance. Do you think it is a worthwhile pursuit, or are there too many little-endian assumptions in the code? It would be fine if a model repacking were required, with an option to use fout.write(struct.pack(">... everywhere in convert-pt-to-ggml.py, for example. And byte reversals seem inexpensive on POWER SIMD.

Jan 05 '23 22:01 fitzsim

I think the only place that has little-endian assumptions is in the FP16 <-> FP32 naive conversion.

What would change in the POWER port if it supported big-endian compared to the existing port? I don't see any of the intrinsics to depend on the endianness.

Jan 06 '23 16:01 ggerganov

To get the model loading I modified read_safe with:

   if constexpr (std::endian::native == std::endian::big) {
        dest = std::byteswap(dest);
    }

Then, running with -t 1 for consistency, in the first call to ggml_vec_dot_f16 I print out the register contents as ggml_fp16_t. Here is the output on little-endian:

0: sx[0]: 2875 a902 a86b 2e3b 1f37 a8ea b122 ae1d 
0: sy[0]: 0 0 0 0 0 0 0 0 
0: sx[1]: 1f37 a8ea b122 ae1d 2f3d 3254 2a08 ae8c 
0: sy[1]: 0 0 0 0 0 0 0 0 
0: sx[2]: 2f3d 3254 2a08 ae8c ae59 2720 2c7e 205b 
0: sy[2]: 0 0 0 0 0 0 0 0 
0: sx[3]: ae59 2720 2c7e 205b a8bd a50d 234b 2456 
0: sy[3]: 0 0 0 0 0 0 0 0 
0: sx[4]: a8bd a50d 234b 2456 9cc1 a40c 910d 1998 
0: sy[4]: 0 0 0 0 0 0 0 0 
0: sx[5]: 9cc1 a40c 910d 1998 1a9e 925b 1166 1992 
0: sy[5]: 0 0 0 0 0 0 0 0 
0: sx[6]: 1a9e 925b 1166 1992 1476 1e74 a200 9de5 
0: sy[6]: 0 0 0 0 0 0 0 0 
0: sx[7]: 1476 1e74 a200 9de5 1e90 9d5e a0b1 9e2d 
0: sy[7]: 0 0 0 0 0 0 0 0 
32: sx[0]: 1e90 9d5e a0b1 9e2d 9318 1de6 1da4 1e1e 
32: sy[0]: 0 0 0 0 0 0 0 0 
32: sx[1]: 9318 1de6 1da4 1e1e 99dc a0ca 193f 1dc3 
32: sy[1]: 0 0 0 0 0 0 0 0 
32: sx[2]: 99dc a0ca 193f 1dc3 1d48 965c 9505 98bf 
32: sy[2]: 0 0 0 0 0 0 0 0 
32: sx[3]: 1d48 965c 9505 98bf 9c49 9d61 9688 177c 
32: sy[3]: 0 0 0 0 0 0 0 0 
32: sx[4]: 9c49 9d61 9688 177c 1e07 8c58 8cab 1a06 
32: sy[4]: 0 0 0 0 0 0 0 0 
32: sx[5]: 1e07 8c58 8cab 1a06 98f0 99ed 1197 8e5f 
32: sy[5]: 0 0 0 0 0 0 0 0 
32: sx[6]: 98f0 99ed 1197 8e5f 1ac6 1b7b 1a2f 1d07 
32: sy[6]: 0 0 0 0 0 0 0 0 
32: sx[7]: 1ac6 1b7b 1a2f 1d07 c48 1207 16a4 8ec3 
32: sy[7]: 0 0 0 0 0 0 0 0 
64: sx[0]: c48 1207 16a4 8ec3 1788 846c 161e 12c8 
64: sy[0]: 0 0 0 0 0 0 0 0 
64: sx[1]: 1788 846c 161e 12c8 186f 1975 1985 bc8 
64: sy[1]: 0 0 0 0 0 0 0 0 
64: sx[2]: 186f 1975 1985 bc8 b03 12a3 1c2e 9958 
64: sy[2]: 0 0 0 0 0 0 0 0 
64: sx[3]: b03 12a3 1c2e 9958 0 0 0 0 
64: sy[3]: 0 0 0 0 0 0 0 0 
64: sx[4]: 0 0 0 0 0 0 0 0 
64: sy[4]: 0 0 0 0 0 0 0 0 
64: sx[5]: 0 0 0 0 0 0 0 0 
64: sy[5]: 0 0 0 0 0 0 0 0 
64: sx[6]: 0 0 0 0 0 0 0 0 
64: sy[6]: 0 0 0 0 0 0 0 0 
64: sx[7]: 0 0 0 0 2bf7 a9f1 abe1 2e21 
64: sy[7]: 0 0 0 0 b778 b778 b778 b778

and on big-endian:

0: sx[0]: 2875 a902 a86b 2e3b 1f37 a8ea b122 ae1d 
0: sy[0]: 0 0 0 0 0 0 0 0 
0: sx[1]: 1f37 a8ea b122 ae1d 2f3d 3254 2a08 ae8c 
0: sy[1]: 0 0 0 0 0 0 0 0 
0: sx[2]: 2f3d 3254 2a08 ae8c ae59 2720 2c7e 205b 
0: sy[2]: 0 0 0 0 0 0 0 0 
0: sx[3]: ae59 2720 2c7e 205b a8bd a50d 234b 2456 
0: sy[3]: 0 0 0 0 0 0 0 0 
0: sx[4]: a8bd a50d 234b 2456 9cc1 a40c 910d 1998 
0: sy[4]: 0 0 0 0 0 0 0 0 
0: sx[5]: 9cc1 a40c 910d 1998 1a9e 925b 1166 1992 
0: sy[5]: 0 0 0 0 0 0 0 0 
0: sx[6]: 1a9e 925b 1166 1992 1476 1e74 a200 9de5 
0: sy[6]: 0 0 0 0 0 0 0 0 
0: sx[7]: 1476 1e74 a200 9de5 1e90 9d5e a0b1 9e2d 
0: sy[7]: 0 0 0 0 0 0 0 0 
32: sx[0]: 1e90 9d5e a0b1 9e2d 9318 1de6 1da4 1e1e 
32: sy[0]: 0 0 0 0 0 0 0 0 
32: sx[1]: 9318 1de6 1da4 1e1e 99dc a0ca 193f 1dc3 
32: sy[1]: 0 0 0 0 0 0 0 0 
32: sx[2]: 99dc a0ca 193f 1dc3 1d48 965c 9505 98bf 
32: sy[2]: 0 0 0 0 0 0 0 0 
32: sx[3]: 1d48 965c 9505 98bf 9c49 9d61 9688 177c 
32: sy[3]: 0 0 0 0 0 0 0 0 
32: sx[4]: 9c49 9d61 9688 177c 1e07 8c58 8cab 1a06 
32: sy[4]: 0 0 0 0 0 0 0 0 
32: sx[5]: 1e07 8c58 8cab 1a06 98f0 99ed 1197 8e5f 
32: sy[5]: 0 0 0 0 0 0 0 0 
32: sx[6]: 98f0 99ed 1197 8e5f 1ac6 1b7b 1a2f 1d07 
32: sy[6]: 0 0 0 0 0 0 0 0 
32: sx[7]: 1ac6 1b7b 1a2f 1d07 c48 1207 16a4 8ec3 
32: sy[7]: 0 0 0 0 0 0 0 0 
64: sx[0]: c48 1207 16a4 8ec3 1788 846c 161e 12c8 
64: sy[0]: 0 0 0 0 0 0 0 0 
64: sx[1]: 1788 846c 161e 12c8 186f 1975 1985 bc8 
64: sy[1]: 0 0 0 0 0 0 0 0 
64: sx[2]: 186f 1975 1985 bc8 b03 12a3 1c2e 9958 
64: sy[2]: 0 0 0 0 0 0 0 0 
64: sx[3]: b03 12a3 1c2e 9958 0 0 0 0 
64: sy[3]: 0 0 0 0 0 0 0 0 
64: sx[4]: 0 0 0 0 0 0 0 0 
64: sy[4]: 0 0 0 0 0 0 0 0 
64: sx[5]: 0 0 0 0 0 0 0 0 
64: sy[5]: 0 0 0 0 0 0 0 0 
64: sx[6]: 0 0 0 0 0 0 0 0 
64: sy[6]: 0 0 0 0 0 0 0 0 
64: sx[7]: 0 0 0 0 2bf7 a9f1 abe1 2e21
64: sy[7]: 0 0 0 0 7c 7c 7c 7c

I want to figure out the b778 vs 007c. The x inputs seem to just be byteswapped, but the y inputs don't match. I can do the byteswapping via intrinsics (vec_revb), but I don't know how to reconcile the different y inputs.

Jan 08 '23 04:01 fitzsim

There are a few things that do not go through read_safe. For example:

mel filters: https://github.com/ggerganov/whisper.cpp/blob/52a3e0c92a8be5150d2a59e492b4943ca8a623b0/whisper.cpp#L542-L543
tensor weights: https://github.com/ggerganov/whisper.cpp/blob/52a3e0c92a8be5150d2a59e492b4943ca8a623b0/whisper.cpp#L1064-L1066

Maybe you have to byte-swap there too.

Jan 08 '23 10:01 ggerganov

#398 is where I'm at. Starting with the 32592001th call to ggml_vec_dot_f16, src1->data is pointing at different data on the big and little endian targets. I'm not sure what next steps to take.

Jan 11 '23 04:01 fitzsim

Implemented in #398.

Feb 14 '23 16:02 fitzsim

whisper.cpp whisper.cpp copied to clipboard

PPC64 big-endian support

whisper.cpp
whisper.cpp copied to clipboard