FEX icon indicating copy to clipboard operation
FEX copied to clipboard

ppc64le support

Open JeremyRand opened this issue 2 years ago • 5 comments

It would be cool if FEX could be ported to ppc64le, e.g. Raptor POWER9 systems. Would there be potential interest in this?

JeremyRand avatar Feb 15 '23 02:02 JeremyRand

It's an interesting idea! Sadly my knowledge of POWER dates back to the PowerMac G3 so I don't know if its feature set is capable of handling x86 emulation.

  • Does it support 4KB pages? If so, is this the common kernel config if it supports multiple sizes?
  • Does it support 128-bit Compare-exchange?
  • Does it support unaligned atomics?
    • Does it support atomic operations directly on memory locations? This is quite an improvement on ARMv8.1
  • Does its latest vector extensions support everything necessary for SSE2/3/4 like ARMv8?
    • I'm only aware of Altivec and paired singles featuresets, so no idea what the latest offers.
  • Does it support 256-bit wide vectors for AVX or is it some optional thing?
    • AVX can be ignored if not supported
  • Does it handle PCIe GPU memory accesses correctly?
    • I think this typically means treating device memory as normal memory? I know a lot of platforms mess this up.
  • Does this platform support 128-bit float in hardware? Might be interesting for x87 emulation.

This would likely end up being low priority (Like RISC-V support), also we would want something in CI if supported, since core emulation is fragile and easy to break.

Sonicadvance1 avatar Feb 15 '23 05:02 Sonicadvance1

Does it support 4KB pages? If so, is this the common kernel config if it supports multiple sizes?

Both 4 KiB and 64 KiB page size are supported; different distros make different choices here, but 64 KiB is more common. It's generally not hard to build a 4 KiB kernel on a distro that packages 64 KiB; a lot of users do this due to better compatibility with poorly written GPU drivers.

Does it support 256-bit wide vectors for AVX or is it some optional thing?

I believe the max vector width is 128-bit, but I'm not 100% sure on that.

also we would want something in CI if supported, since core emulation is fragile and easy to break.

I can't guarantee anything, but I suspect the Talos community would be able to donate access to a ppc64le VM for this purpose.

Unfortunately I don't have answers to the rest of your questions, but I'm pinging some people on IRC who may be able to chime in with better answers. I really appreciate the detailed, well-thought-out questions you posed.

JeremyRand avatar Feb 15 '23 15:02 JeremyRand

Most of my answers come from the ISA reference which can be found on the OpenPOWER Foundation site. I'm quoting from 3.0 here, which is what Power9 implements - the processor in my personal Talos II. Also note that I am not representing IBM while making this comment.

  • Does it support 4KB pages? If so, is this the common kernel config if it supports multiple sizes?

It depends on the target environment. Server distros (RHEL, Alpine) tend towards 64K, while desktop distros (Void, Adélie, Chimera) tend towards 4K. Debian ports had both varieties some years ago, but I am not sure what they have now.

  • Does it support 128-bit Compare-exchange?

Power9 (ISA 3.0) can do 128-bit atomic stores, but only with 64-bit pair values, so that's probably not what you are after. (FC=11000, "Store Twin", ISA reference pp 861)

  • Does it support unaligned atomics?

Somewhat - atomic locations must be contained to an aligned 32-byte block, but the 8 or 16 bytes may appear anywhere in the block. They just cannot cross into another block. (ISA reference pp 860, 862)

  • Does it support atomic operations directly on memory locations? This is quite an improvement on ARMv8.1

Yes, Atomic Memory Operations are so named because they operate on memory (but it must not be cache-inhibited; ISA reference pp 857). You can see the GCC intrinsics for inspiration.

  • Does its latest vector extensions support everything necessary for SSE2/3/4 like ARMv8?

Not really. The compiler team at IBM added some porting aids for "original" SSE via an xmmintrin.h for ppc64el, if that is helpful at all.

  • I'm only aware of Altivec and paired singles featuresets, so no idea what the latest offers.

AltiVec (aka VMX) is available, but the newer vector extensions are called VSX. They add more instructions and registers but do not increase the width of the registers (still 128 bits).

  • Does it support 256-bit wide vectors for AVX or is it some optional thing?

No.

  • Does it handle PCIe GPU memory accesses correctly?
    • I think this typically means treating device memory as normal memory? I know a lot of platforms mess this up.

I would need further clarification to answer this. I'm able to use Radeon drivers on both big and little endian Power9 systems, so I would say the platform is capable of handling PCIe GPU memory accesses 😄

  • Does this platform support 128-bit float in hardware? Might be interesting for x87 emulation.

Yes! There is quad-float support in hardware with ISA 3.0 (Power9).

This would likely end up being low priority (Like RISC-V support), also we would want something in CI if supported, since core emulation is fragile and easy to break.

There is ppc64el support in Travis-CI, if that would be useful for you. I see that this repo seems to use GitHub Actions; it looks like there is an obtuse but functional way to use a Power system from that. Feel free to ping me when CI would be useful, as options and available resources for open source projects may change in that timeframe.

awilfox avatar Feb 16 '23 06:02 awilfox

A lot of good information there! Looks like it might be viable to have a non-AVX implementation. Slightly annoying is that there doesn't seem to be a 128-bit CAS, but you can use lqarx+stqcx to emulated it. Similar to how ARMv8 has reservation atomic loadstores.

Took some additional peeking at the register arrangement on the platform, it seems like there are 32 GPRs, 32 FPRs, 64 vectors? Not sure how the FPRs and Vectors overlap but hopefully something like how SSE overlaps MMX, or maybe both. So static-register allocation likely fits in on that platform.

So we just need a fast and lightweight code emitter for ISA 3.0 and someone with time to find all the problems with implementing it in FEX.

Sonicadvance1 avatar Feb 16 '23 09:02 Sonicadvance1

Hi, I understand RISC-V support isn't in the cards. I happen to own a headless Milk-V Mars (running Debian Trixie) and I intend to get my hands on Banana Pi BPI-F3. Anyway, I tried building FEX and naturally I stumbled on

[0/2] Re-checking globbed directories...
[25/376] Building CXX object Source/Common/CMakeFiles/Common.dir/HostFeatures.cpp.o
FAILED: Source/Common/CMakeFiles/Common.dir/HostFeatures.cpp.o
/usr/bin/clang++ -DFEX_HAS_PRESERVE_ALL_ATTR=0 -DFEX_PRESERVE_ALL_ATTR="" -DGLOBAL_DATA_DIRECTORY=\"/usr/share/fex-emu/\" -DHAS_SYSCALL_GETCPU=1 -DHAS_SYSCALL_GETTID=1 -DHAS_SYSCALL_RENAMEAT2=1 -DHAS_SYSCALL_STATX=1 -DHAS_SYSCALL_TGKILL=1 -I/home/andrea/dev/FEX/Build/Source/Common -I/home/andrea/dev/FEX/Source/Common -I/home/andrea/dev/FEX/External/robin-map/include -I/home/andrea/dev/FEX/External/tiny-json -I/home/andrea/dev/FEX/Source -I/home/andrea/dev/FEX/Build/Source -I/home/andrea/dev/FEX/Source/Common/External/cpp-optparse -I/home/andrea/dev/FEX/Build/generated -I/home/andrea/dev/FEX/External/xbyak -I/home/andrea/dev/FEX/Build/FEXCore/Source -I/home/andrea/dev/FEX/FEXCore/include -I/home/andrea/dev/FEX/Build/include -I/home/andrea/dev/FEX/External/fmt/include -I/home/andrea/dev/FEX/External/xxhash/cmake_unofficial/.. -I/home/andrea/dev/FEX/FEXHeaderUtils/. -I/home/andrea/dev/FEX/CodeEmitter/. -O3 -DNDEBUG -fomit-frame-pointer -std=gnu++20 -flto=thin -fPIC   -Wno-trigraphs -fdiagnostics-color=always -fcolor-diagnostics -Wno-deprecated-enum-enum-conversion -Wall -MD -MT Source/Common/CMakeFiles/Common.dir/HostFeatures.cpp.o -MF Source/Common/CMakeFiles/Common.dir/HostFeatures.cpp.o.d -o Source/Common/CMakeFiles/Common.dir/HostFeatures.cpp.o -c /home/andrea/dev/FEX/Source/Common/HostFeatures.cpp
/home/andrea/dev/FEX/Source/Common/HostFeatures.cpp:619:26: error: use of undeclared identifier 'GetCPUFeaturesFromIDRegisters'
  619 |   CPUFeatures Features = GetCPUFeaturesFromIDRegisters();
      |                          ^
1 error generated.
[30/376] Building CXX object Source/Common/CMakeFiles/Common.dir/FEXServerClient.cpp.o

If you ever change your mind re: RISC-V, give me a shout!

andreamancuso avatar Dec 09 '24 23:12 andreamancuso