zig icon indicating copy to clipboard operation
zig copied to clipboard

aarch64 feature identification failure (mrs illigal instruction)

Open nsluhrs opened this issue 1 year ago • 11 comments
trafficstars

Zig Version

0.14.0-dev.1063+1018cdc0a

Steps to Reproduce and Observed Behavior

Run zig's aarch64 build on Samsung note 9 via termux and attempt to execute the zig targets command. This results in an illegal instruction some other commands also do so but I think its the same root cause.

With the help of the discord we think we identified the root issue https://discord.com/channels/605571803288698900/1273695605637644399 But the following is a summary of the findings.

As part of identification of available features on aarch64 platforms (getAArch64CpuFeature) the mrs instruction is used to read the feature registers. On the Qualcomm snapdragon 845 in the Samsung galaxy note 9 this appears to be an illegal instruction. It seems like we could do something similar to what is being done for non aarch64 cases as a fall back, potentially putting some additional condition around the switch case on line 394 of lib/std/zig/system/linux.zig that allows defaulting to /proc/cpuinfo in the event that mrs is unavailable I also tried looking into arm.zig to see if I could figure out what those values should be but I wasn't able to locate all the flags listed by /proc/cpuinfo.

For reference here is the contents of my /proc/cpuinfo and my lscpu output. cpuinfo.txt lscpu.txt

Expected Behavior

Output list of available platforms

nsluhrs avatar Aug 16 '24 10:08 nsluhrs

Does zig care about all the feature flags or is there a subset that actually gets used? For instance i didn't see the asimd flag in the arm.zig file

nsluhrs avatar Aug 16 '24 10:08 nsluhrs

It would be nice to know exactly which of these registers are problematic on that machine. It would be fairly surprising if the mrs instruction is just blocked wholesale.

alexrp avatar Aug 16 '24 12:08 alexrp

The failed instruction is mrs x9, midr_el1 so definitely at least MIDR_EL1 but since that's the first one it doesn't rule out mrs not working. Is there a good way for me to test the other ones? I'm not too good at actually writing zig yet so I might need a minimum example or something.

nsluhrs avatar Aug 16 '24 12:08 nsluhrs

Well, you can just try editing that array of registers, replacing calls with 0 until you narrow down which work and which don't. Armed with that info, I think it'd be easier to determine the right way forward.

alexrp avatar Aug 16 '24 12:08 alexrp

I changed it in the source before rebuilding but that doesn't seem to have properly disabled the midr check

nsluhrs avatar Aug 16 '24 13:08 nsluhrs

I figured out that part, I've tried 4 different registers, none of them work. I really suspect that the mrs instruction just doesn't work. It's kind of a lot of steps for each register I turn off. I checked individually the first 2nd 3rd and 8th.

nsluhrs avatar Aug 17 '24 16:08 nsluhrs

Now that I take a closer look, I actually have no idea how this ever worked? EL0 is user space and EL1 is kernel mode; of course these registers will not be readable.

alexrp avatar Aug 17 '24 16:08 alexrp

The problem was introduced with https://github.com/ziglang/zig/pull/19595.

alexrp avatar Aug 17 '24 16:08 alexrp

@nsluhrs what is your kernel version?

alexrp avatar Aug 17 '24 22:08 alexrp

I was just checking that 4.9.186-22990479 unfortunately, looks like that feature wasn't allowed until 4.10 Could we have it check the kernel version as part of the process and would we be willing to support such an old kernel version? Alternatively, a way to specify available features manually might be an interesting option

nsluhrs avatar Aug 18 '24 07:08 nsluhrs

Formally, we only support Linux 4.19+. However, in general, patches to support older systems are welcome, as long as they have minimal maintenance cost and have no impact on code compiled for newer (supported) systems.

Anyway, can you not just upgrade the device to a newer kernel version? Linux 4.10 was released in 2017, while the device was released in 2018, and Linux 4.19 also in 2018.

alexrp avatar Aug 21 '24 14:08 alexrp

The failed instruction is mrs x9, midr_el1

Also see previous discussion in https://github.com/termux/termux-packages/issues/20783#issuecomment-2211719239

LinuxUserGD avatar Aug 31 '24 18:08 LinuxUserGD

This seems reasonable, I'll go ahead and close the issue!

nsluhrs avatar Sep 05 '24 12:09 nsluhrs