o2 icon indicating copy to clipboard operation
o2 copied to clipboard

`-mcx16` is only available on x86 CPUs

Open umlaeute opened this issue 3 years ago • 7 comments

on non-macOS, o2 is unconditionally compiled with the -mcx16 flag, as can be seen in https://github.com/rbdannenberg/o2/blob/253d109f4a2eaf62a885663ca50a34a6f04dbd74/CMakeLists.txt#L361

however, this flag is only available for x86 CPUs (32bit, 64bit), and not for other archs.

I discovered this when compiling for s390x (which is admittedly an odd target architecture for o2), but the problem persists with common architectures like armhf (as found e.g. in the RaspberryPi) and arm64 (RPi4,...)

umlaeute avatar Oct 03 '22 15:10 umlaeute

Does the flag cause a serious problem? I'm not sure how to add the flag conditioned on the target architecture. As noted elsewhere, there might be a bigger issue with atomics on ARM (-mcx16 enables code generation used for atomic list operations in O2).

rbdannenberg avatar Oct 08 '22 16:10 rbdannenberg

Does the flag cause a serious problem?

if "serious" means "does not compile" then: yes.

$ touch foo.c
$ gcc foo.c -mcx16 -o foo.o
gcc: error: unrecognized command-line option '-mcx16'
$ echo $?
1
$ uname -a
Linux amdahl 5.10.0-18-arm64 #1 SMP Debian 5.10.140-1 (2022-09-02) aarch64 GNU/Linux
$

i think it would be nice if o2 could work on RaspberriPi and Apple M1.

umlaeute avatar Oct 08 '22 19:10 umlaeute

Yes, I agree: RaspberriPi and Apple M1. All my regression tests run on Apple M2, so I assume M1 as well (but it would be nice to confirm). If you mean 32-bit RaspberriPi, then the starting point is lock-free lists. libatomic is not guaranteed to be lock free, but I don't know the Raspberry Pi situation for sure. EDIT: Upon further investigation, it appears that Raspberry Pi can do lock-free lists, e.g. if not libatomic, then mintomic looks good. Also, I think the 32-bit problem might just be that O2 assumes 64-bit words and 8-byte alignment of some things, but those could be enforced on 32-bit architectures for simplicity and it should all work. I am planning to buy a Raspberry Pi Zero W, figuring newer, bigger versions will be backward compatible. Any advice is welcome.

rbdannenberg avatar Oct 09 '22 01:10 rbdannenberg

Should I add an explicit option called ARM_ARCHITECTURE that will inhibit the -mcx16 flag? I don't know a better way to do this.

rbdannenberg avatar Oct 09 '22 14:10 rbdannenberg

I think it's better to not mention the "arm" architecture, as this is really about non-x86. (According to the GCC docs, '-mcx16' is a flag that is only valid on this single architecture)

umlaeute avatar Oct 09 '22 16:10 umlaeute

Yes, I agree: RaspberriPi and Apple M1

just to be clear about here: i'm of course (mostly) talking about Linux here. i threw in the "Apple M1" as an obvious bait. Currently building for macOS/arm64 (that is, the typical "Apple M1" setup), will work fine, simply because the -mcx16 flag is omitted on this platform (presumably because the alignment is correct on this platform anyhow), which is basically hiding the issue.

I am planning to buy a Raspberry Pi Zero W

that sounds like a good plan.

umlaeute avatar Oct 10 '22 06:10 umlaeute

FWIW there's CMAKE_SYSTEM_PROCESSOR which is not really platform agnostic, but probably set to x86_64 for Linux targets where -mcx16 is appropriate.

vnorilo avatar Oct 27 '22 18:10 vnorilo

I'm running O2 on Apple M1 now. Everything seems to be OK wrt -mcx16

rbdannenberg avatar Nov 15 '23 01:11 rbdannenberg

i haven't done any recent checks but when you say

I'm running O2 on Apple M1 now. Everything seems to be OK wrt -mcx16

..., how does this relate to my

Currently building for macOS/arm64 (that is, the typical "Apple M1" setup), will work fine, simply because the -mcx16 flag is omitted on this platform

?

that is: i think we had already established that macOS/arm64 works fine. the problem is with Linux/non-x86

should i update the title accordingly, or just open a new issue?

umlaeute avatar Feb 09 '24 17:02 umlaeute

Sorry about the cryptic reply earlier. I'm not sure when I made the changes wrt the posts here, but I revised the code to do 16-byte alignment everywhere and even on a 32-bit Raspberry Pi, and I have O2 running on the Raspberry Pi. It's possible there's still a problem in CMake files because -mcx16 is apparently always set for Linux. There's a comment/hint that should appear in CMake under ARCHITECTURE_C_FLAGS that says "-mcx16 flag is required for x64 builds, but you should clear this for other architectures." Maybe this should be more automatic, but it also gets confusing when you are cross-compiling, e.g. I do builds for x86_64 on my Mac M2 laptop.

rbdannenberg avatar Feb 09 '24 18:02 rbdannenberg