highway icon indicating copy to clipboard operation
highway copied to clipboard

ppc implementation ?

Open devnull31 opened this issue 4 years ago • 8 comments

Hello, interesting project ! In the instruction matrix you speak about ppc instructions coverage, but I couldn't find relative header file, like for neon and x86. What about this ? Thanks !

devnull31 avatar Sep 03 '20 22:09 devnull31

Hi, thanks for your interest. FYI a major API update is in the works.

We do not yet have a PPC implementation. Would you be interested in adding one?

jan-wassenberg avatar Sep 04 '20 07:09 jan-wassenberg

I don't know much about SIMD with PPC, that's why I would be interested by your impl ^^ and as you have this instruction matrix with ppc, I thought you had at least some work in progress.

devnull31 avatar Sep 04 '20 08:09 devnull31

Same here - interested in PPC support but could only help in testing/validation aspects.

pramodk avatar Dec 14 '20 20:12 pramodk

Hi, I understand. Let's reopen this in case someone would like to build (part of) a ppc implementation?

jan-wassenberg avatar Dec 17 '20 09:12 jan-wassenberg

I would love to try this, but for starters, PowerPC doesn't have just one "SIMD" implementation, at least not historically. Altivec/VMX is now a part of Power ISA 2.03, but there's also older cores which might not have any SIMD extension or may have some different one. I know of at least two others, and although I'd hazard a guess that Altivec would be the most useful for most people, it'd be great to have more:

The e500 line of PowerPC cores uses SPE (Signal Processing Engine) extensions which overlap with the instructions used for Altivec but do not perform the same tasks.

The PowerPC 750CL has something called "paired singles" (quick run-down) instead; it is likely that this chip is derived from the Nintendo Wii's "Broadway" chip design, so it is likely that the Wii also uses this instead of Altivec.

Some don't have any special SIMD implementations at all; in this case, the work would have to be done without them.

There's also the problem of me not having much low level experience with SIMD in general, let alone on POWER. I do have a Powerbook G4 (Motorola/Freescale 7447a, which has an older version of Altivec/VMX on it), so I might try to learn it and see if I can do anything to help. As silly as it is, I'd love to be able to make an excuse to still have a G4.

wyatt8740 avatar Feb 26 '21 23:02 wyatt8740

Interesting, hadn't heard of paired singles, reminds a bit of 3DNow. I agree that Altivec would have the most impact.

On x86 we also have various instruction sets so what we end up doing is clustering them into SSE4, AVX2, AVX3. On SSSE3 CPUs, for simplicity, we fall back to scalar (no SIMD) because the few extra SSE4.1 ops aren't supported.

For POWER it could also make sense to check for some subset, but I'm not sure where to draw the line - is VMX enough or is VSX important for being able to implement all the ops?

jan-wassenberg avatar Mar 01 '21 14:03 jan-wassenberg

I have ported over the x86_128-inl.h header over to Altivec/VSX, and the changes needed to support Altivec/VSX can be found in the https://github.com/johnplatts/highway_ppc_port repository.

The ppc_altivec-inl.h contains the implementation of the vector types and vector ops for Altivec/VSX.

I added the HWY_ALTIVEC, HWY_PPC7, HWY_PPC9, and HWY_PPC10 targets in the https://github.com/johnplatts/highway_ppc_port fork as some of the functionality in the ppc_altivec-inl.h header is dependent on Power9 or Power10 instructions.

johnplatts avatar Jun 22 '22 04:06 johnplatts

@johnplatts Wow, that is awesome :) Would you like us to integrate that?

Adding the targets is reasonable but we're pushing the limits of 32 bits at the moment. I was anyway planning to change to 64 bits before the 1.0 release to give more room. Do you think 8 (vs the 5 you've added) is enough to accommodate any reasonable growth in the ISA over 5-10 years? Or perhaps 12 is safer?

What are you using for testing?

jan-wassenberg avatar Jun 22 '22 06:06 jan-wassenberg

@jan-wassenberg I have been using qemu 6.2.0, qemu 7.2.0, and Ubuntu 22.04 on x86-64 to develop the port of Google Highway to PPC.

Here is how to set up a development environment for compiling for powerpc64le on x86-64 Ubuntu 22.04:

  1. Add the ppc64el target by executing the sudo --add-architecture ppc64el command.
  2. Creating an /etc/apt/sources.list.d/non-x86-cross-compile-sources.list file with the following contents:
deb [arch=ppc64el] http://ports.ubuntu.com/ jammy-updates main restricted
deb [arch=ppc64el] http://ports.ubuntu.com/ jammy universe
deb [arch=ppc64el] http://ports.ubuntu.com/ jammy-updates universe
deb [arch=ppc64el] http://ports.ubuntu.com/ jammy multiverse
deb [arch=ppc64el] http://ports.ubuntu.com/ jammy-updates multiverse
deb [arch=ppc64el] http://ports.ubuntu.com/ jammy-backports main restricted universe multiverse
  1. Modifying the deb lines in /etc/apt/sources.list to only include the amd64 architecture:
deb [arch=amd64] http://us-west-2.ec2.archive.ubuntu.com/ubuntu/ jammy main restricted
# deb-src http://us-west-2.ec2.archive.ubuntu.com/ubuntu/ jammy main restricted

## Major bug fix updates produced after the final release of the
## distribution.
deb [arch=amd64] http://us-west-2.ec2.archive.ubuntu.com/ubuntu/ jammy-updates main restricted
# deb-src http://us-west-2.ec2.archive.ubuntu.com/ubuntu/ jammy-updates main restricted

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team. Also, please note that software in universe WILL NOT receive any
## review or updates from the Ubuntu security team.
deb [arch=amd64] http://us-west-2.ec2.archive.ubuntu.com/ubuntu/ jammy universe
# deb-src http://us-west-2.ec2.archive.ubuntu.com/ubuntu/ jammy universe
deb [arch=amd64] http://us-west-2.ec2.archive.ubuntu.com/ubuntu/ jammy-updates universe
# deb-src http://us-west-2.ec2.archive.ubuntu.com/ubuntu/ jammy-updates universe

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team, and may not be under a free licence. Please satisfy yourself as to
## your rights to use the software. Also, please note that software in
## multiverse WILL NOT receive any review or updates from the Ubuntu
## security team.
deb [arch=amd64] http://us-west-2.ec2.archive.ubuntu.com/ubuntu/ jammy multiverse
# deb-src http://us-west-2.ec2.archive.ubuntu.com/ubuntu/ jammy multiverse
deb [arch=amd64] http://us-west-2.ec2.archive.ubuntu.com/ubuntu/ jammy-updates multiverse
# deb-src http://us-west-2.ec2.archive.ubuntu.com/ubuntu/ jammy-updates multiverse

## N.B. software from this repository may not have been tested as
## extensively as that contained in the main release, although it includes
## newer versions of some applications which may provide useful features.
## Also, please note that software in backports WILL NOT receive any review
## or updates from the Ubuntu security team.
deb [arch=amd64] http://us-west-2.ec2.archive.ubuntu.com/ubuntu/ jammy-backports main restricted universe multiverse
# deb-src http://us-west-2.ec2.archive.ubuntu.com/ubuntu/ jammy-backports main restricted universe multiverse

deb [arch=amd64] http://security.ubuntu.com/ubuntu jammy-security main restricted
# deb-src http://security.ubuntu.com/ubuntu jammy-security main restricted
deb [arch=amd64] http://security.ubuntu.com/ubuntu jammy-security universe
# deb-src http://security.ubuntu.com/ubuntu jammy-security universe
deb [arch=amd64] http://security.ubuntu.com/ubuntu jammy-security multiverse
# deb-src http://security.ubuntu.com/ubuntu jammy-security multiverse
  1. Execute the sudo apt update command to update the package index files.
  2. Execute the sudo apt install gdb-multiarch qemu-user qemu-user-static build-essential g++-12-powerpc64le-linux-gnu binutils-powerpc64le-linux-gnu binutils-powerpc64le-linux-gnu-dbg libc6:ppc64el libstdc++6:ppc64el clang-15 command to install the g++ cross compiler for powerpc64le, clang-15 (which includes cross compilation support for the powerpc64le target), qemu-user (which allows execution of powerpc64le programs on x86-64 systems), gdb-multiarch (which allows debugging of powerpc64le programs on x86-64 systems), libc6:ppc64el (the glibc shared libraries for powerpc64le), and libstdc++6:ppc64el (the libstdc++ shared library for powerpc64le).

In addition to installing the QEMU 6.2.0 packages that are part of the Ubuntu 22.04 distribution, I have also compiled QEMU 7.2.0 from source as QEMU 7.2.0 has support for emulating a POWER10 CPU whereas QEMU 6.2.0 can only emulate a POWER9 or earlier CPU.

Here is an example cmake command line for configuring a build of only the HWY_PPC8 target using the powerpc64le-linux-gnu-g++-12 cross compiler: CC=powerpc64le-linux-gnu-gcc-12 CXX=powerpc64le-linux-gnu-g++-12 cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER_TARGET="powerpc64le-linux-gnu" -DCMAKE_CXX_COMPILER_TARGET="powerpc64le-linux-gnu" -DCMAKE_CROSSCOMPILING=true -DCMAKE_C_FLAGS='-mcpu=power9 -mno-power9-vector -mpower8-vector -DHWY_DISABLED_TARGETS=6917951240106147840' -DCMAKE_CXX_FLAGS='-mcpu=power9 -mno-power9-vector -mpower8-vector -DHWY_DISABLED_TARGETS=6917951240106147840'

Here is an example cmake command line for configuring a build of only the HWY_PPC9 target using the powerpc64le-linux-gnu-g++-12 cross compiler: CC=powerpc64le-linux-gnu-gcc-12 CXX=powerpc64le-linux-gnu-g++-12 cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER_TARGET="powerpc64le-linux-gnu" -DCMAKE_CXX_COMPILER_TARGET="powerpc64le-linux-gnu" -DCMAKE_CROSSCOMPILING=true -DCMAKE_C_FLAGS='-mcpu=power9 -DHWY_DISABLED_TARGETS=6918232715082858496' -DCMAKE_CXX_FLAGS='-mcpu=power9 -DHWY_DISABLED_TARGETS=6918232715082858496'

Here is an example cmake command line for configuring a build of only the HWY_PPC8 target using clang++-15 (which includes cross-compilation support to powerpc64le): CC=clang-15 CXX=clang++-15 cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER_TARGET="powerpc64le-linux-gnu" -DCMAKE_CXX_COMPILER_TARGET="powerpc64le-linux-gnu" -DCMAKE_CROSSCOMPILING=true -DCMAKE_C_FLAGS='-mcpu=power8 -mpower8-vector -DHWY_DISABLED_TARGETS=6917951240106147840' -DCMAKE_CXX_FLAGS='-mcpu=power8 -mpower8-vector -DHWY_DISABLED_TARGETS=6917951240106147840'

Here is an example cmake command line for configuring a build of only the HWY_PPC9 target using clang++-15 (which includes cross-compilation support to powerpc64le): CC=clang-15 CXX=clang++-15 cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER_TARGET="powerpc64le-linux-gnu" -DCMAKE_CXX_COMPILER_TARGET="powerpc64le-linux-gnu" -DCMAKE_CROSSCOMPILING=true -DCMAKE_C_FLAGS='-mcpu=power9 -DHWY_DISABLED_TARGETS=6918232715082858496' -DCMAKE_CXX_FLAGS='-mcpu=power9 -DHWY_DISABLED_TARGETS=6918232715082858496'

The -DCMAKE_CROSSCOMPILING_EMULATOR=/path/to/qemu_ppc64le argument (where /path/to/qemu_ppc64le is replaced with the actual path to qemu_ppc64le) option can be passed into cmake to specify the path to the QEMU emulator that should be used to execute the cross-compiled Google Highway binaries and run the Google Highway unit tests.

johnplatts avatar Feb 18 '23 02:02 johnplatts

Thank you @johnplatts for the guide! This sounds doable at least for manual pre-release testing. Would it be OK for you if only the released versions of Highway are known to work with PPC? BTW I'm curious what your intended application/use case is :)

jan-wassenberg avatar Feb 20 '23 17:02 jan-wassenberg

Thanks again @johnplatts , I was able to cross-compile and test via QEMU after setting export QEMU_LD_PREFIX=/usr/powerpc64le-linux-gnu.

It's great to have this in, I'm marking PPC as supported and it will be tested before releases.

jan-wassenberg avatar Feb 23 '23 17:02 jan-wassenberg