Halide icon indicating copy to clipboard operation
Halide copied to clipboard

halide 18.0.0: `FTBFS on arm64 with SVE: Unhandled exception: Error: For SVE/SVE2 support, target_vector_bits=<size> must be set in target``

Open LebedevRI opened this issue 1 year ago • 12 comments

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1087943

On arm64 systems with SVE support halide fails to build with the following error:

 [2087/4204] cd /<<PKGBUILDDIR>>-build/build/stage-0/halide/python_bindings/apps && /<<PKGBUILDDIR>>-build/build/stage-0/halide/tutorial/lesson_21_auto_scheduler_generate -r app_aot_bilateral_grid.runtime -o . -e object target=host
 FAILED: python_bindings/apps/app_aot_bilateral_grid.runtime.o /<<PKGBUILDDIR>>-build/build/stage-0/halide/python_bindings/apps/app_aot_bilateral_grid.runtime.o
 cd /<<PKGBUILDDIR>>-build/build/stage-0/halide/python_bindings/apps && /<<PKGBUILDDIR>>-build/build/stage-0/halide/tutorial/lesson_21_auto_scheduler_generate -r app_aot_bilateral_grid.runtime -o . -e object target=host
 Unhandled exception: Error: For SVE/SVE2 support, target_vector_bits=<size> must be set in target.

To check if the system has SVE support:

 $ lscpu | grep sve
 Flags:                              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs paca pacg dcpodp svei8mm svebf16 i8mm bf16 dgh rng

This is probably essentially a duplicate of https://github.com/halide/Halide/issues/8114. Has this been resolved in 19.0.0 by any chance?

LebedevRI avatar Dec 20 '24 15:12 LebedevRI

I think this would have been introduced by #8298 when we added SVE support detection. We don't have any SVE-capable hardware, which is why this didn't come up in testing. A quick workaround would be to set Halide_TARGET to something like arm-64-linux-arm_dot_prod-arm_fp16 to skip feature detection for SVE.

Relatedly, it would be nice to have no_-* features so we could write host-no_sve instead.

@steven-johnson -- is it ever appropriate to pick target_vector_bits as part of host detection? Should it be detected only in the runtime compatibility stage, and not in host?

alexreinking avatar Dec 20 '24 17:12 alexreinking

Likely the answer is to not turn on any SVE features automatically for compilation. Arguably if computing a target for JIT, it is reasonable, but in that case target_vector_bits should be set as well. Even in the JIT case it will be dicey because real systems are likely to be heterogenous re: CPUs.

At present, the entire family of SVE stuff is a failure as any sort of mass market technology. Use cases are inherently specialized and trying to make it easy to use it is pointless.

zvookin avatar Dec 20 '24 18:12 zvookin

Likely the answer is to not turn on any SVE features automatically for compilation

+1

steven-johnson avatar Dec 20 '24 19:12 steven-johnson

Okay, and what about the #8114 case? I think RISC-V does not have fixed vectors.

LebedevRI avatar Dec 20 '24 19:12 LebedevRI

@alexreinking we are also facing this issue on MacOS (Apple silicon) and Linux (AWS Graviton 3 and up). Unfortunately your suggested Halide_TARGET doesn't seem to be disabling SVE because we still get the same error.

And when trying to set the vector size, using either arm-64-linux-sve-vector_bits_128 or arm-64-linux-sve2-vector_bits_128 we get the following error during runtime:

SIGILL - Illegal instruction signal

Is there anything we can try to force disable sve? Thank you!

gregory-rizzo avatar Sep 16 '25 03:09 gregory-rizzo

@gregory-rizzo -- do you have a simple reproducer? I have access to an Apple Silicon machine. That would go a long way in getting this bug fixed.

alexreinking avatar Sep 17 '25 01:09 alexreinking

So... there is at least one bug here that is fairly easy to fix. The automatic feature detection code should never set the SVE/SVE2 flags on ARM or the RVV flag on RISC-V without also setting a vector width. I.e. it needs to go a step further and determine the hardware supported vector width.

Apple Silicon does not support SVE/SVE2 and the automatic detection code does not look like it can turn it on. The #ifdef tree is a bit problematic in that it doesn't use else blocks except for the last one, but I'm pretty sure that code never turns on SVE or SVE2 for Apple hosts. If something is turning it on manually, then illegal instruction exceptions are expected.

Graviton 3 seems to support SVE but not SVE2 and supposedly Graviton 4 supports both. If one is getting illegal instruction exceptions with the appropriate target flags on those systems, that is a codegen bug. I wouldn't be terribly surprised if there are bugs in SVE/SVE2 codegen, either at the Halide or LLVM level, as this has not been well tested due to lack of hardware support. (It is possible the issue is that the vector width needs to be set to 256 bits, but code compiled for a smaller vector width should run on hardware with a larger one.) It would be great to get localized disassembly of what is actually failing here, though to make progress it will likely require someone getting access to that hardware.

SVE/SVE2 support is pretty dubious given that hardware support is not really happening. I'd suggest just deprecating the whole thing but SME/SME2 support would be broadly useful and likely depends on the SVE2 support. Plus vscale support is still necessary for RISC-V so removing SVE/SVE2 doesn't get rid of that huge wart.

zvookin avatar Sep 17 '25 06:09 zvookin

Sorry, I didn't give enough details: I noticed SVE2 being detected on some Apple Silicon models when using VMWare Fusion and an Ubuntu ARM VM. I will try to coming up with an easy sequence of instructions to reproduce the issue and more details about where I am seeing it happening.

gregory-rizzo avatar Sep 17 '25 15:09 gregory-rizzo

Can you say what Apple processor you are working on? Guess I'll write some code to ask an M4 what it thinks it supports as clearly Arm wanted SME to imply SVE and Apple nixed that. I'll be very surprised if it claims SVE support however. Perhaps there's a bug in the Linux support which assumed SME implied SVE as that seemed like a good assumption at one time. One can look at /proc/cpuinfo under the VM to see whether SVE/SVE2 are in the flags list.

zvookin avatar Sep 17 '25 17:09 zvookin

Ok sorry it took me a bit of time to investigate because I realized I did hit a very unique use case. I am using MacOS with VMWare Fusion (that you can use for free). I gathered a bunch of different Apple hardware and I tested with two VMs: one with Ubuntu server 24.04.3 and one with Ubuntu server 20.04.5.

For each use case, I run cat /proc/cpuinfo:

MacBook Pro Apple M1 Max

Ubuntu server 24.04.3 Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint

Ubuntu server 20.04.5 Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint

Mac mini Apple M2 Pro

Ubuntu server 24.04.3 Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint

Ubuntu server 20.04.5 Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint

Mac mini Apple M4 Pro

Ubuntu server 24.04.3 Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp flagm2 frint

Ubuntu server 20.04.5 Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 flagm2 frint

So somehow sv2 only shows up on a Mac mini with M4 Pro and with Ubuntu server 20.04.5 (not 24.04.3 somehow).

Once you have that VM running, just compiling Halide 19 using the Readme instructions results in the following error:

`[1978/4252] Building C object test/correctness/CMakeFiles/correctness_plain_c_includes.dir/plain_c_includes.c.o cc1: warning: command line option ‘-Woverloaded-virtual’ is valid for C++/ObjC++ but not for C cc1: warning: command line option ‘-Wsuggest-override’ is valid for C++/ObjC++ but not for C cc1: warning: command line option ‘-Wno-old-style-cast’ is valid for C++/ObjC++ but not for C [2066/4252] Generating my_first_generator.runtime.o FAILED: tutorial/my_first_generator.runtime.o /home/test/Halide/build/tutorial/my_first_generator.runtime.o cd /home/test/Halide/build/tutorial && /home/test/Halide/build/tools/gengen -r my_first_generator.runtime -o . -e object target=host Unhandled exception: Error: For SVE/SVE2 support, target_vector_bits= must be set in target.

[2068/4252] Generating my_second_generator_1.runtime.o FAILED: tutorial/my_second_generator_1.runtime.o /home/test/Halide/build/tutorial/my_second_generator_1.runtime.o cd /home/test/Halide/build/tutorial && /home/test/Halide/build/tools/gengen -r my_second_generator_1.runtime -o . -e object target=host Unhandled exception: Error: For SVE/SVE2 support, target_vector_bits= must be set in target.

[2077/4252] Building CXX object python_bindings/src/halide/CMakeFiles/Halide_Python.dir/halide_/PyParam.cpp.o ninja: build stopped: subcommand failed.`

gregory-rizzo avatar Sep 29 '25 03:09 gregory-rizzo

So somehow sv2 only shows up on a Mac mini with M4 Pro and with Ubuntu server 20.04.5 (not 24.04.3 somehow).

There must have been a bug in the older kernel to report support where there is none...

One possible workaround would be to set Halide_TARGET=arm-64-linux-arm_dot_prod-arm_fp16 at the CMake command line. You can also disable the tutorials with WITH_TUTORIALS=NO.

alexreinking avatar Sep 29 '25 13:09 alexreinking

Yes, I saw that suggestion earlier in the thread but it didn't seem to disable sve for me and I was still getting the error. Instead, I was able to disable sve with recompiling Halide with this diff:

index bdab34b75..df7b1fc91 100644
--- a/src/Target.cpp
+++ b/src/Target.cpp
@@ -243,13 +243,13 @@ Target calculate_host_target() {
         initial_features.push_back(Target::ARMFp16);
     }
 
-    if (hwcaps & HWCAP_SVE) {
+    /*if (hwcaps & HWCAP_SVE) {
         initial_features.push_back(Target::SVE);
     }
 
     if (hwcaps2 & HWCAP2_SVE2) {
         initial_features.push_back(Target::SVE2);
-    }
+    }*/
 #endif
 
 #ifdef _MSC_VER
@@ -268,9 +268,9 @@ Target calculate_host_target() {
         initial_features.push_back(Target::ARMDotProd);
     }
 
-    if (IsProcessorFeaturePresent(PF_ARM_SVE_INSTRUCTIONS_AVAILABLE)) {
+    /*if (IsProcessorFeaturePresent(PF_ARM_SVE_INSTRUCTIONS_AVAILABLE)) {
         initial_features.push_back(Target::SVE);
-    }
+    }*/
 
 #endif

gregory-rizzo avatar Oct 02 '25 17:10 gregory-rizzo