tinymembench icon indicating copy to clipboard operation
tinymembench copied to clipboard

Enable the AArch32 NEON implementations on AArch64 kernels

Open lgeek opened this issue 9 years ago • 10 comments

AArch64 Linux lists the "fp asimd" features in /proc/cpuinfo for VFP & NEON enabled hardware. VFP and NEON are mandatory to run Linux in AArch64 (accoring to the ARMv8 manual):

All systems that support standard operating systems with rich application environments provide hardware support for floating- point and Advanced SIMD. It is a requirement of the ARM Procedure Call Standard for AArch64.

lgeek avatar Oct 12 '16 12:10 lgeek

Well, but your patch does not fix any real problem. Or does it?

NEON is already used by the 32-bit build of tinymembench, see https://github.com/ssvb/tinymembench/wiki/PINE64-%28Allwinner-A64%29 The tricky thing is that the kernel provides different content of the /proc/cpuinfo to 32-bit and 64-bit applications.

ssvb avatar Oct 12 '16 12:10 ssvb

Check out the results on S905: https://github.com/ssvb/tinymembench/wiki/Tronsmart-Vega-S95-(Amlogic-S905). Tinymembench fails to run the NEON implementations when compiled for AArch32.

lgeek avatar Oct 12 '16 12:10 lgeek

Hmm, can you dump the /proc/cpuinfo from a 32-bit application on S905? Maybe its kernel is missing this patch - http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/296878.html

ssvb avatar Oct 12 '16 13:10 ssvb

Yeah, I can confirm that patch is missing. I'm using the Hardkernel Odroid-C2 kernel fork, so it's not just my bodged together setup which is affected.

Thanks for linking the patch, I'll apply it to my tree. However, since a number of machines are already affected and the tinymembench patch doesn't break anything, maybe it should still be applied. I'm happy to rename it to "Enable the AArch32 NEON implementations on broken AArch64 kernels" :)

lgeek avatar Oct 12 '16 13:10 lgeek

Yes, please update the patch. Also it would be great if tinymembench used this workaround, but still showed a warning to the users, encouraging them to fix the problem on the kernel side. Because if we don't do this, then we are opening a can of worms. People might think that it's okay to run such kernels. And every application which detects NEON via /proc/cpuinfo would need to apply a similar workaround too, with the reasoning that "tinymembench and the others already do this" :)

ssvb avatar Oct 12 '16 13:10 ssvb

I've had another look at this. The important bits of that patch were applied to the kernel tree I was using, I've just happened to check for the two lines which were not. However, as far as I can tell, the PER_LINUX32 personality is not automatically set for 32-bit ELF files. This means that you have to run applications using linux32 to get the backwards-compatible /proc/cpuinfo, i.e. ./tinymembench will not run the NEON & VFP implementations, but linux32 ./tinymembench will do.

lgeek avatar Oct 13 '16 10:10 lgeek

Yeah, the plot thickens...

ssvb avatar Oct 13 '16 11:10 ssvb

There was one more kernel patch - https://patchwork.kernel.org/patch/9144983/ I think that we need to ping kernel developers to clarify its status.

ssvb avatar Oct 13 '16 11:10 ssvb

This is getting quite messy. I think there are three approaches which should work both with existing kernels and keep working in the future:

  • this patch, which accepts asimd for NEON support in /proc/cpuinfo
  • calling personality(PER_LINUX32) in tinymembench compiled for AArch32
  • reading AT_HWCAP and checking for the COMPAT_HWCAP_NEON bit, which seems to be set for 32-bit ELF files on AArch64 kernels independent of the personality - I think I prefer this one over parsing /proc/cpuinfo

lgeek avatar Oct 13 '16 14:10 lgeek

Sigh. I don't feel like discussing it with the kernel developers again. But the least invasive solution is to just take your patch to treat 'asimd' as an alias for 'neon' and be done with it.

If you could update the commit message with a better explanation of the current situation, then it would be perfect.

ssvb avatar Feb 06 '17 15:02 ssvb