highway icon indicating copy to clipboard operation
highway copied to clipboard

inlining failed in call to ‘always_inline’ ‘void vst1q_u64(uint64_t*, uint64x2_t)’: target specific option mismatch 11002 | vst1q_u64 (uint64_t * __a, uint64x2_t __b)

Open stefson opened this issue 3 years ago • 33 comments
trafficstars

hi, this is most likely a regression from https://github.com/google/highway/commit/d8867c95df5c5bcd33562b3a24c96f5a54d298a8

compiler is gcc-10.4.0 on armhf

[8/38] /usr/bin/armv7a-unknown-linux-gnueabihf-g++ -DHWY_SHARED_DEFINE -Dhwy_contrib_EXPORTS -I/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999  -O2 -pipe -fomit-frame-pointer -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -Wno-builtin-macro-redefined -D__DATE__=\"redacted\" -D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\" -fmerge-all-constants -Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor -fmath-errno -fno-exceptions -MD -MT CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_128a.cc.o -MF CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_128a.cc.o.d -o CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_128a.cc.o -c /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_128a.cc
FAILED: CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_128a.cc.o 
/usr/bin/armv7a-unknown-linux-gnueabihf-g++ -DHWY_SHARED_DEFINE -Dhwy_contrib_EXPORTS -I/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999  -O2 -pipe -fomit-frame-pointer -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -Wno-builtin-macro-redefined -D__DATE__=\"redacted\" -D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\" -fmerge-all-constants -Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor -fmath-errno -fno-exceptions -MD -MT CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_128a.cc.o -MF CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_128a.cc.o.d -o CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_128a.cc.o -c /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_128a.cc
In file included from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/foreach_target.h:23,
                 from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_128a.cc:20:
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/targets.h: In function ‘std::vector<unsigned int> hwy::SupportedAndGeneratedTargets()’:
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/detect_targets.h:438:50: warning: integer overflow in expression of type ‘int’ results in ‘-2147483648’ [-Woverflow]
  438 | #define HWY_TARGETS (HWY_ATTAINABLE_TARGETS & (2 * HWY_STATIC_TARGET - 1))
      |                                                  ^
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/targets.h:69:48: note: in expansion of macro ‘HWY_TARGETS’
   69 |   for (uint32_t targets = SupportedTargets() & HWY_TARGETS; targets != 0;
      |                                                ^~~~~~~~~~~
In file included from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/highway.h:25,
                 from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/shared-inl.h:103,
                 from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/traits128-inl.h:27,
                 from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_128a.cc:23,
                 from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/foreach_target.h:81,
                 from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_128a.cc:20:
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/targets.h: In member function ‘size_t hwy::ChosenTarget::GetIndex() const’:
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/detect_targets.h:438:50: warning: integer overflow in expression of type ‘int’ results in ‘-2147483648’ [-Woverflow]
  438 | #define HWY_TARGETS (HWY_ATTAINABLE_TARGETS & (2 * HWY_STATIC_TARGET - 1))
      |                                                  ^
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/targets.h:157:7: note: in definition of macro ‘HWY_CHOSEN_TARGET_SHIFT’
  157 |   ((((X) >> (HWY_HIGHEST_TARGET_BIT + 1 - HWY_MAX_DYNAMIC_TARGETS)) & \
      |       ^
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/targets.h:163:28: note: in expansion of macro ‘HWY_TARGETS’
  163 |   (HWY_CHOSEN_TARGET_SHIFT(HWY_TARGETS) | HWY_CHOSEN_TARGET_MASK_SCALAR | 1u)
      |                            ^~~~~~~~~~~
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/targets.h:267:47: note: in expansion of macro ‘HWY_CHOSEN_TARGET_MASK_TARGETS’
  267 |                                               HWY_CHOSEN_TARGET_MASK_TARGETS);
      |                                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/ops/arm_neon-inl.h:29,
                 from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/highway.h:322,
                 from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/shared-inl.h:103,
                 from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/traits128-inl.h:27,
                 from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_128a.cc:23,
                 from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/foreach_target.h:81,
                 from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_128a.cc:20:
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h: In function ‘void hwy::N_NEON::StoreU(hwy::N_NEON::Vec128<long long unsigned int, 2>, hwy::N_NEON::Full128<long long unsigned int>, uint64_t*)’:
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:11002:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_u64(uint64_t*, uint64x2_t)’: target specific option mismatch
11002 | vst1q_u64 (uint64_t * __a, uint64x2_t __b)
      | ^~~~~~~~~
In file included from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/highway.h:322,
                 from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/shared-inl.h:103,
                 from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/traits128-inl.h:27,
                 from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_128a.cc:23,
                 from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/foreach_target.h:81,
                 from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_128a.cc:20:
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/ops/arm_neon-inl.h:2744:12: note: called from here
 2744 |   vst1q_u64(unaligned, v.raw);
      |   ~~~~~~~~~^~~~~~~~~~~~~~~~~~

full build log: build.log.zip

stefson avatar Jul 07 '22 10:07 stefson

@stefson hm, I'm mystified 😦 Here is the function definition I see from armhf gcc 10:

__extension__ extern __inline void
__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
vst1q_u64 (uint64_t * __a, uint64x2_t __b)
{
  __builtin_neon_vst1v2di ((__builtin_neon_di *) __a, (int64x2_t) __b);
}

Unlike the p64 version, which sets extra attributes, I don't see any here. If you comment out the call to vst1q_u64, are there any other errors?

For the warning, I think we can fix that by replacing 2 with 2ULL; we'll anyway soon make targets 64-bit.

jan-wassenberg avatar Jul 12 '22 14:07 jan-wassenberg

@stefson do you have any idea what might be happening? If not, it's an option to disable runtime dispatch for arm7 on this version of GCC.

jan-wassenberg avatar Jul 22 '22 14:07 jan-wassenberg

could you guide me a little bit in how to disable runtime dispatch?

stefson avatar Jul 22 '22 17:07 stefson

Sure, in detect_targets.h we have a line #if HWY_ARCH_X86 || (HWY_ARCH_ARM && HWY_COMPILER_GCC_ACTUAL && HWY_OS_LINUX). You can for example change HWY_ARCH_ARM to HWY_ARCH_ARM_A64.

jan-wassenberg avatar Jul 25 '22 07:07 jan-wassenberg

do you mean as in:

diff --git a/hwy/detect_targets.h b/hwy/detect_targets.h
index afc9154..7be7770 100644
--- a/hwy/detect_targets.h
+++ b/hwy/detect_targets.h
@@ -372,7 +372,7 @@
 
 // x86 compilers generally allow runtime dispatch. On Arm, currently only GCC
 // does, and we require Linux to detect CPU capabilities.
-#if HWY_ARCH_X86 || (HWY_ARCH_ARM && HWY_COMPILER_GCC_ACTUAL && HWY_OS_LINUX)
+#if HWY_ARCH_X86 || (HWY_ARCH_ARM_A64 && HWY_COMPILER_GCC_ACTUAL && HWY_OS_LINUX)
 #define HWY_HAVE_RUNTIME_DISPATCH 1
 #else
 #define HWY_HAVE_RUNTIME_DISPATCH 0

?

stefson avatar Jul 25 '22 09:07 stefson

Yes, looks good :) If this helps you, feel free to send this as a pull request, or we can do it if you prefer.

jan-wassenberg avatar Jul 25 '22 10:07 jan-wassenberg

it helps indeed, but I need more time to iron this out - arm hardware really is slow.

stefson avatar Jul 25 '22 11:07 stefson

I wonder about a sensible strategy for a fix on the compiler side?

stefson avatar Sep 05 '22 09:09 stefson

I cannot reproduce any compilation issue on armhf/gcc10|11|12.

For reference:

% grep -3 vst1q_u64 /usr/lib/gcc/arm-linux-gnueabihf/*/include/arm_neon.h
/usr/lib/gcc/arm-linux-gnueabihf/10/include/arm_neon.h-
/usr/lib/gcc/arm-linux-gnueabihf/10/include/arm_neon.h-__extension__ extern __inline void
/usr/lib/gcc/arm-linux-gnueabihf/10/include/arm_neon.h-__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
/usr/lib/gcc/arm-linux-gnueabihf/10/include/arm_neon.h:vst1q_u64 (uint64_t * __a, uint64x2_t __b)
/usr/lib/gcc/arm-linux-gnueabihf/10/include/arm_neon.h-{
/usr/lib/gcc/arm-linux-gnueabihf/10/include/arm_neon.h-  __builtin_neon_vst1v2di ((__builtin_neon_di *) __a, (int64x2_t) __b);
/usr/lib/gcc/arm-linux-gnueabihf/10/include/arm_neon.h-}
--
/usr/lib/gcc/arm-linux-gnueabihf/11/include/arm_neon.h-
/usr/lib/gcc/arm-linux-gnueabihf/11/include/arm_neon.h-__extension__ extern __inline void
/usr/lib/gcc/arm-linux-gnueabihf/11/include/arm_neon.h-__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
/usr/lib/gcc/arm-linux-gnueabihf/11/include/arm_neon.h:vst1q_u64 (uint64_t * __a, uint64x2_t __b)
/usr/lib/gcc/arm-linux-gnueabihf/11/include/arm_neon.h-{
/usr/lib/gcc/arm-linux-gnueabihf/11/include/arm_neon.h-  __builtin_neon_vst1v2di ((__builtin_neon_di *) __a, (int64x2_t) __b);
/usr/lib/gcc/arm-linux-gnueabihf/11/include/arm_neon.h-}
--
/usr/lib/gcc/arm-linux-gnueabihf/12/include/arm_neon.h-
/usr/lib/gcc/arm-linux-gnueabihf/12/include/arm_neon.h-__extension__ extern __inline void
/usr/lib/gcc/arm-linux-gnueabihf/12/include/arm_neon.h-__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
/usr/lib/gcc/arm-linux-gnueabihf/12/include/arm_neon.h:vst1q_u64 (uint64_t * __a, uint64x2_t __b)
/usr/lib/gcc/arm-linux-gnueabihf/12/include/arm_neon.h-{
/usr/lib/gcc/arm-linux-gnueabihf/12/include/arm_neon.h-  __builtin_neon_vst1v2di ((__builtin_neon_di *) __a, (int64x2_t) __b);
/usr/lib/gcc/arm-linux-gnueabihf/12/include/arm_neon.h-}

% gcc-10 --version gcc-10 (Debian 10.4.0-4) 10.4.0 Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

% gcc-11 --version gcc-11 (Debian 11.3.0-5) 11.3.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

% gcc-12 --version gcc-12 (Debian 12.2.0-1) 12.2.0 Copyright (C) 2022 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

malaterre avatar Sep 05 '22 11:09 malaterre

I see the same results as you for my gcc-10 armv7a cross compiler, but still it gives me the error without the now pushed patch.

stefson avatar Sep 05 '22 13:09 stefson

I see the same results as you for my gcc-10 armv7a cross compiler, but still it gives me the error without the now pushed patch.

Add '--verbose' to the compilation line that is failing and post back. Eg.:

/usr/bin/armv7a-unknown-linux-gnueabihf-g++ --verbose -DHWY_SHARED_DEFINE -Dhwy_contrib_EXPORTS [...]

malaterre avatar Sep 05 '22 13:09 malaterre

hey, here is my output from --verbose, it is with commit https://github.com/google/highway/commit/9b3bd6d48445443afaa7211a4e72842373d28f1e to not hide the problem with the current workaround:

LANG=C /usr/bin/armv7a-unknown-linux-gnueabihf-g++ --verbose -DHWY_SHARED_DEFINE -Dhwy_contrib_EXPORTS -I/var/tmp/portage/dev-cpp/highway-9999/work/highway-9999  -O2 -pipe -fomit-frame-pointer -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -Wno-builtin-macro-redefined -D__DATE__=\"redacted\" -D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\" -fmerge-all-constants -Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor -fmath-errno -fno-exceptions -MD -MT CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.o -MF CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.o.d -o CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.o -c /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_i64d.cc
Using built-in specs.
COLLECT_GCC=/usr/bin/armv7a-unknown-linux-gnueabihf-g++
Target: armv7a-unknown-linux-gnueabihf
Configured with: /var/tmp/portage/cross-armv7a-unknown-linux-gnueabihf/gcc-10.4.0/work/gcc-10.4.0/configure --host=x86_64-pc-linux-gnu --target=armv7a-unknown-linux-gnueabihf --build=x86_64-pc-linux-gnu --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/armv7a-unknown-linux-gnueabihf/gcc-bin/10.4.0 --includedir=/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include --datadir=/usr/share/gcc-data/armv7a-unknown-linux-gnueabihf/10.4.0 --mandir=/usr/share/gcc-data/armv7a-unknown-linux-gnueabihf/10.4.0/man --infodir=/usr/share/gcc-data/armv7a-unknown-linux-gnueabihf/10.4.0/info --with-gxx-include-dir=/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/g++-v10 --with-python-dir=/share/gcc-data/armv7a-unknown-linux-gnueabihf/10.4.0/python --enable-languages=c,c++,fortran --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --disable-libunwind-exceptions --enable-checking=release --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 10.4.0 p5' --disable-esp --enable-libstdcxx-time --disable-libstdcxx-pch --enable-poison-system-directories --with-sysroot=/usr/armv7a-unknown-linux-gnueabihf --disable-bootstrap --enable-__cxa_atexit --enable-clocale=gnu --disable-multilib --disable-fixed-point --with-float=hard --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --enable-libgomp --disable-libssp --disable-libada --disable-cet --disable-systemtap --disable-vtable-verify --disable-libvtv --without-zstd --enable-lto --without-isl --enable-default-pie --enable-default-ssp
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.4.0 (Gentoo 10.4.0 p5) 
COLLECT_GCC_OPTIONS='-v' '-D' 'HWY_SHARED_DEFINE' '-D' 'hwy_contrib_EXPORTS' '-I' '/var/tmp/portage/dev-cpp/highway-9999/work/highway-9999' '-O2' '-pipe' '-fomit-frame-pointer' '-fPIC' '-fvisibility=hidden' '-fvisibility-inlines-hidden' '-Wno-builtin-macro-redefined' '-D' '__DATE__="redacted"' '-D' '__TIMESTAMP__="redacted"' '-D' '__TIME__="redacted"' '-fmerge-all-constants' '-Wall' '-Wextra' '-Wconversion' '-Wsign-conversion' '-Wvla' '-Wnon-virtual-dtor' '-fmath-errno' '-fno-exceptions' '-MD' '-MT' 'CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.o' '-MF' 'CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.o.d' '-o' 'CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.o' '-c' '-shared-libgcc'  '-mfloat-abi=hard' '-mfpu=vfpv3-d16' '-mtls-dialect=gnu' '-marm' '-mlibarch=armv7-a+fp' '-march=armv7-a+fp'
 /usr/libexec/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/cc1plus -quiet -v -I /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999 -MD CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.d -MF CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.o.d -MT CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.o -D_GNU_SOURCE -D HWY_SHARED_DEFINE -D hwy_contrib_EXPORTS -D __DATE__="redacted" -D __TIMESTAMP__="redacted" -D __TIME__="redacted" /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_i64d.cc -quiet -dumpbase vqsort_i64d.cc -mfloat-abi=hard -mfpu=vfpv3-d16 -mtls-dialect=gnu -marm -mlibarch=armv7-a+fp -march=armv7-a+fp -auxbase-strip CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.o -O2 -Wno-builtin-macro-redefined -Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor -version -fomit-frame-pointer -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -fmerge-all-constants -fmath-errno -fno-exceptions -o - |
 /usr/libexec/gcc/armv7a-unknown-linux-gnueabihf/as -v -I /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999 -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -meabi=5 -o CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.o
GNU assembler version 2.36.1 (armv7a-unknown-linux-gnueabihf) using BFD version (Gentoo 2.36.1 p5) 2.36.1
Assembler messages:
Fatal error: can't create CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.o: No such file or directory
GNU C++14 (Gentoo 10.4.0 p5) version 10.4.0 (armv7a-unknown-linux-gnueabihf)
	compiled by GNU C version 10.4.0, GMP version 6.2.1, MPFR version 4.1.0-p13, MPC version 1.2.1, isl version none
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/usr/armv7a-unknown-linux-gnueabihf/usr/local/include"
ignoring nonexistent directory "/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/../../../../armv7a-unknown-linux-gnueabihf/include"
#include "..." search starts here:
#include <...> search starts here:
 /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999
 /usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/g++-v10
 /usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/g++-v10/armv7a-unknown-linux-gnueabihf
 /usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/g++-v10/backward
 /usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include
 /usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include-fixed
 /usr/armv7a-unknown-linux-gnueabihf/usr/include
End of search list.
GNU C++14 (Gentoo 10.4.0 p5) version 10.4.0 (armv7a-unknown-linux-gnueabihf)
	compiled by GNU C version 10.4.0, GMP version 6.2.1, MPFR version 4.1.0-p13, MPC version 1.2.1, isl version none
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: d32c7f800b89674769804ef9c6a8ad26
In file included from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/ops/arm_neon-inl.h:29,
                 from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/highway.h:358,
                 from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/shared-inl.h:103,
                 from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/traits-inl.h:27,
                 from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_i64d.cc:23,
                 from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/foreach_target.h:81,
                 from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_i64d.cc:20:
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h: In function 'void hwy::N_NEON::StoreU(hwy::N_NEON::Vec128<long long int, 2>, hwy::N_NEON::Full128<long long int>, int64_t*)':
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:10958:1: error: inlining failed in call to 'always_inline' 'void vst1q_s64(int64_t*, int64x2_t)': target specific option mismatch
10958 | vst1q_s64 (int64_t * __a, int64x2_t __b)
      | ^~~~~~~~~
In file included from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/highway.h:358,
                 from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/shared-inl.h:103,
                 from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/traits-inl.h:27,
                 from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_i64d.cc:23,
                 from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/foreach_target.h:81,
                 from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_i64d.cc:20:
/var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/ops/arm_neon-inl.h:2765:12: note: called from here
 2765 |   vst1q_s64(unaligned, v.raw);
      |   ~~~~~~~~~^~~~~~~~~~~~~~~~~~

frankly I don't see any difference in the error, does the other information tell you something?

stefson avatar Sep 05 '22 13:09 stefson

@jan-wassenberg Do you believe it makes sense to compile highway with neon support using the default -mfpu=vfpv3-d16 ( generic-armv7-a defaults to vfpv3-d16.) ...

malaterre avatar Sep 05 '22 13:09 malaterre

@malaterre, good catch, thanks for pointing to that. vfpv4 is supported since 2009, I'd be surprised if anyone still cares about vfpv3. set_macros-inl.h does:

#if HWY_ARCH_ARM_V7
#define HWY_TARGET_STR "+neon-vfpv4"

It makes sense that the compiler complains because arm_neon.h is compiled with the default target and only for Highway implementation and user code do we set vfpv4.

Here's an idea @stefson : does it help to, in arm_neon-inl.h move the following block to the line after HWY_BEFORE_NAMESPACE();?

HWY_DIAGNOSTICS(push)
HWY_DIAGNOSTICS_OFF(disable : 4701, ignored "-Wuninitialized")
#include <arm_neon.h>
HWY_DIAGNOSTICS(pop)

jan-wassenberg avatar Sep 05 '22 14:09 jan-wassenberg

Can you please post a patch against latest git for your idea? The risk of a missunderstanding is too high if you ask me that way :D

stefson avatar Sep 05 '22 14:09 stefson

Sure, sent :)

jan-wassenberg avatar Sep 05 '22 14:09 jan-wassenberg

with gcc-10.4.0: latest-git+patch.log.gz

this does not look good :-S

stefson avatar Sep 05 '22 14:09 stefson

this does not look good :-S

try:

  • https://github.com/google/highway/pull/966#issuecomment-1237150317

malaterre avatar Sep 05 '22 14:09 malaterre

push force it and ping me any time for results

stefson avatar Sep 05 '22 14:09 stefson

@stefson can you try 55010e4b126d222acd6906ebdb32f723f94ccafb ?

malaterre avatar Sep 06 '22 09:09 malaterre

it seems the compile is fixed by this commit: https://github.com/google/highway/commit/864d97bc74de6681d3e5e382582ddaa2a0837426

patching https://github.com/google/highway/commit/55010e4b126d222acd6906ebdb32f723f94ccafb on top of current git fails:

/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:11002:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_u64(uint64_t*, uint64x2_t)’: target specific option mismatch
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:10974:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_f32(float32_t*, float32x4_t)’: target specific option mismatch
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:10944:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_s16(int16_t*, int16x8_t)’: target specific option mismatch
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:10974:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_f32(float32_t*, float32x4_t)’: target specific option mismatch
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:11002:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_u64(uint64_t*, uint64x2_t)’: target specific option mismatch
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:10944:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_s16(int16_t*, int16x8_t)’: target specific option mismatch
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:10951:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_s32(int32_t*, int32x4_t)’: target specific option mismatch
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:10951:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_s32(int32_t*, int32x4_t)’: target specific option mismatch
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:10958:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_s64(int64_t*, int64x2_t)’: target specific option mismatch
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:10958:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_s64(int64_t*, int64x2_t)’: target specific option mismatch
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:11002:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_u64(uint64_t*, uint64x2_t)’: target specific option mismatch

full build log: build.log.gz

stefson avatar Sep 06 '22 11:09 stefson

for aarch64, current git does fail with many errors ( full log: aarch64-current-git-build.log.gz ) , which is fixed with the proposed patch from https://github.com/google/highway/commit/55010e4b126d222acd6906ebdb32f723f94ccafb

stefson avatar Sep 06 '22 11:09 stefson

Thank you, then we'll commit that patch shortly :)

jan-wassenberg avatar Sep 06 '22 12:09 jan-wassenberg

yeah, lets watch the fireworks

stefson avatar Sep 06 '22 12:09 stefson

armv7-gcc still broken with commit https://github.com/google/highway/commit/99340469dd310055f8f269ebe1621c9aaaa79322 , here is the build log: build.log.gz

stefson avatar Sep 06 '22 19:09 stefson

Thanks for sharing the result. I was unable to reproduce it with GCC 10.3 (godbolt lacks 10.4) and -O2 -march=armv7-a -mfpu=vfpv3-d16, and your -O2 -mfloat-abi=hard -mfpu=vfpv3-d16 -marm -mlibarch=armv7-a+fp -march=armv7-a+fp. https://gcc.godbolt.org/z/KrYz818xY

jan-wassenberg avatar Sep 07 '22 08:09 jan-wassenberg

can you please name me the gcc versions (gcc-10.3.0 and later) which godbolt offers you? (Edit: I meant versions :D )

stefson avatar Sep 07 '22 08:09 stefson

You can see them in the dropdown menu in the link above, where it currently says "ARM GCC 10.3.1" :) The next higher one is 11.1.

jan-wassenberg avatar Sep 07 '22 09:09 jan-wassenberg

ah, got it! :D

I can offer you a log of failed compile with gcc-11.3.0, which seems identically to me: gcc-11.3.0-armv7a.log.gz

stefson avatar Sep 07 '22 09:09 stefson

:) The question is not whether we can get it to fail with other compilers. Instead the problem appears to be the configuration of the compiler, because it works (see godbolt link) with 11.3 and the flags specified there. Have you compiled gcc from source, or is it from a binary release?

jan-wassenberg avatar Sep 07 '22 09:09 jan-wassenberg