highway
highway copied to clipboard
inlining failed in call to ‘always_inline’ ‘void vst1q_u64(uint64_t*, uint64x2_t)’: target specific option mismatch 11002 | vst1q_u64 (uint64_t * __a, uint64x2_t __b)
hi, this is most likely a regression from https://github.com/google/highway/commit/d8867c95df5c5bcd33562b3a24c96f5a54d298a8
compiler is gcc-10.4.0 on armhf
[8/38] /usr/bin/armv7a-unknown-linux-gnueabihf-g++ -DHWY_SHARED_DEFINE -Dhwy_contrib_EXPORTS -I/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999 -O2 -pipe -fomit-frame-pointer -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -Wno-builtin-macro-redefined -D__DATE__=\"redacted\" -D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\" -fmerge-all-constants -Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor -fmath-errno -fno-exceptions -MD -MT CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_128a.cc.o -MF CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_128a.cc.o.d -o CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_128a.cc.o -c /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_128a.cc
FAILED: CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_128a.cc.o
/usr/bin/armv7a-unknown-linux-gnueabihf-g++ -DHWY_SHARED_DEFINE -Dhwy_contrib_EXPORTS -I/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999 -O2 -pipe -fomit-frame-pointer -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -Wno-builtin-macro-redefined -D__DATE__=\"redacted\" -D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\" -fmerge-all-constants -Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor -fmath-errno -fno-exceptions -MD -MT CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_128a.cc.o -MF CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_128a.cc.o.d -o CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_128a.cc.o -c /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_128a.cc
In file included from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/foreach_target.h:23,
from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_128a.cc:20:
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/targets.h: In function ‘std::vector<unsigned int> hwy::SupportedAndGeneratedTargets()’:
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/detect_targets.h:438:50: warning: integer overflow in expression of type ‘int’ results in ‘-2147483648’ [-Woverflow]
438 | #define HWY_TARGETS (HWY_ATTAINABLE_TARGETS & (2 * HWY_STATIC_TARGET - 1))
| ^
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/targets.h:69:48: note: in expansion of macro ‘HWY_TARGETS’
69 | for (uint32_t targets = SupportedTargets() & HWY_TARGETS; targets != 0;
| ^~~~~~~~~~~
In file included from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/highway.h:25,
from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/shared-inl.h:103,
from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/traits128-inl.h:27,
from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_128a.cc:23,
from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/foreach_target.h:81,
from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_128a.cc:20:
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/targets.h: In member function ‘size_t hwy::ChosenTarget::GetIndex() const’:
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/detect_targets.h:438:50: warning: integer overflow in expression of type ‘int’ results in ‘-2147483648’ [-Woverflow]
438 | #define HWY_TARGETS (HWY_ATTAINABLE_TARGETS & (2 * HWY_STATIC_TARGET - 1))
| ^
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/targets.h:157:7: note: in definition of macro ‘HWY_CHOSEN_TARGET_SHIFT’
157 | ((((X) >> (HWY_HIGHEST_TARGET_BIT + 1 - HWY_MAX_DYNAMIC_TARGETS)) & \
| ^
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/targets.h:163:28: note: in expansion of macro ‘HWY_TARGETS’
163 | (HWY_CHOSEN_TARGET_SHIFT(HWY_TARGETS) | HWY_CHOSEN_TARGET_MASK_SCALAR | 1u)
| ^~~~~~~~~~~
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/targets.h:267:47: note: in expansion of macro ‘HWY_CHOSEN_TARGET_MASK_TARGETS’
267 | HWY_CHOSEN_TARGET_MASK_TARGETS);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/ops/arm_neon-inl.h:29,
from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/highway.h:322,
from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/shared-inl.h:103,
from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/traits128-inl.h:27,
from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_128a.cc:23,
from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/foreach_target.h:81,
from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_128a.cc:20:
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h: In function ‘void hwy::N_NEON::StoreU(hwy::N_NEON::Vec128<long long unsigned int, 2>, hwy::N_NEON::Full128<long long unsigned int>, uint64_t*)’:
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:11002:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_u64(uint64_t*, uint64x2_t)’: target specific option mismatch
11002 | vst1q_u64 (uint64_t * __a, uint64x2_t __b)
| ^~~~~~~~~
In file included from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/highway.h:322,
from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/shared-inl.h:103,
from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/traits128-inl.h:27,
from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_128a.cc:23,
from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/foreach_target.h:81,
from /var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_128a.cc:20:
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/ops/arm_neon-inl.h:2744:12: note: called from here
2744 | vst1q_u64(unaligned, v.raw);
| ~~~~~~~~~^~~~~~~~~~~~~~~~~~
full build log: build.log.zip
@stefson hm, I'm mystified 😦 Here is the function definition I see from armhf gcc 10:
__extension__ extern __inline void
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
vst1q_u64 (uint64_t * __a, uint64x2_t __b)
{
__builtin_neon_vst1v2di ((__builtin_neon_di *) __a, (int64x2_t) __b);
}
Unlike the p64 version, which sets extra attributes, I don't see any here. If you comment out the call to vst1q_u64, are there any other errors?
For the warning, I think we can fix that by replacing 2 with 2ULL; we'll anyway soon make targets 64-bit.
@stefson do you have any idea what might be happening? If not, it's an option to disable runtime dispatch for arm7 on this version of GCC.
could you guide me a little bit in how to disable runtime dispatch?
Sure, in detect_targets.h we have a line #if HWY_ARCH_X86 || (HWY_ARCH_ARM && HWY_COMPILER_GCC_ACTUAL && HWY_OS_LINUX). You can for example change HWY_ARCH_ARM to HWY_ARCH_ARM_A64.
do you mean as in:
diff --git a/hwy/detect_targets.h b/hwy/detect_targets.h
index afc9154..7be7770 100644
--- a/hwy/detect_targets.h
+++ b/hwy/detect_targets.h
@@ -372,7 +372,7 @@
// x86 compilers generally allow runtime dispatch. On Arm, currently only GCC
// does, and we require Linux to detect CPU capabilities.
-#if HWY_ARCH_X86 || (HWY_ARCH_ARM && HWY_COMPILER_GCC_ACTUAL && HWY_OS_LINUX)
+#if HWY_ARCH_X86 || (HWY_ARCH_ARM_A64 && HWY_COMPILER_GCC_ACTUAL && HWY_OS_LINUX)
#define HWY_HAVE_RUNTIME_DISPATCH 1
#else
#define HWY_HAVE_RUNTIME_DISPATCH 0
?
Yes, looks good :) If this helps you, feel free to send this as a pull request, or we can do it if you prefer.
it helps indeed, but I need more time to iron this out - arm hardware really is slow.
I wonder about a sensible strategy for a fix on the compiler side?
I cannot reproduce any compilation issue on armhf/gcc10|11|12.
For reference:
% grep -3 vst1q_u64 /usr/lib/gcc/arm-linux-gnueabihf/*/include/arm_neon.h
/usr/lib/gcc/arm-linux-gnueabihf/10/include/arm_neon.h-
/usr/lib/gcc/arm-linux-gnueabihf/10/include/arm_neon.h-__extension__ extern __inline void
/usr/lib/gcc/arm-linux-gnueabihf/10/include/arm_neon.h-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
/usr/lib/gcc/arm-linux-gnueabihf/10/include/arm_neon.h:vst1q_u64 (uint64_t * __a, uint64x2_t __b)
/usr/lib/gcc/arm-linux-gnueabihf/10/include/arm_neon.h-{
/usr/lib/gcc/arm-linux-gnueabihf/10/include/arm_neon.h- __builtin_neon_vst1v2di ((__builtin_neon_di *) __a, (int64x2_t) __b);
/usr/lib/gcc/arm-linux-gnueabihf/10/include/arm_neon.h-}
--
/usr/lib/gcc/arm-linux-gnueabihf/11/include/arm_neon.h-
/usr/lib/gcc/arm-linux-gnueabihf/11/include/arm_neon.h-__extension__ extern __inline void
/usr/lib/gcc/arm-linux-gnueabihf/11/include/arm_neon.h-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
/usr/lib/gcc/arm-linux-gnueabihf/11/include/arm_neon.h:vst1q_u64 (uint64_t * __a, uint64x2_t __b)
/usr/lib/gcc/arm-linux-gnueabihf/11/include/arm_neon.h-{
/usr/lib/gcc/arm-linux-gnueabihf/11/include/arm_neon.h- __builtin_neon_vst1v2di ((__builtin_neon_di *) __a, (int64x2_t) __b);
/usr/lib/gcc/arm-linux-gnueabihf/11/include/arm_neon.h-}
--
/usr/lib/gcc/arm-linux-gnueabihf/12/include/arm_neon.h-
/usr/lib/gcc/arm-linux-gnueabihf/12/include/arm_neon.h-__extension__ extern __inline void
/usr/lib/gcc/arm-linux-gnueabihf/12/include/arm_neon.h-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
/usr/lib/gcc/arm-linux-gnueabihf/12/include/arm_neon.h:vst1q_u64 (uint64_t * __a, uint64x2_t __b)
/usr/lib/gcc/arm-linux-gnueabihf/12/include/arm_neon.h-{
/usr/lib/gcc/arm-linux-gnueabihf/12/include/arm_neon.h- __builtin_neon_vst1v2di ((__builtin_neon_di *) __a, (int64x2_t) __b);
/usr/lib/gcc/arm-linux-gnueabihf/12/include/arm_neon.h-}
% gcc-10 --version gcc-10 (Debian 10.4.0-4) 10.4.0 Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
% gcc-11 --version gcc-11 (Debian 11.3.0-5) 11.3.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
% gcc-12 --version gcc-12 (Debian 12.2.0-1) 12.2.0 Copyright (C) 2022 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
I see the same results as you for my gcc-10 armv7a cross compiler, but still it gives me the error without the now pushed patch.
I see the same results as you for my gcc-10 armv7a cross compiler, but still it gives me the error without the now pushed patch.
Add '--verbose' to the compilation line that is failing and post back. Eg.:
/usr/bin/armv7a-unknown-linux-gnueabihf-g++ --verbose -DHWY_SHARED_DEFINE -Dhwy_contrib_EXPORTS [...]
hey, here is my output from --verbose, it is with commit https://github.com/google/highway/commit/9b3bd6d48445443afaa7211a4e72842373d28f1e to not hide the problem with the current workaround:
LANG=C /usr/bin/armv7a-unknown-linux-gnueabihf-g++ --verbose -DHWY_SHARED_DEFINE -Dhwy_contrib_EXPORTS -I/var/tmp/portage/dev-cpp/highway-9999/work/highway-9999 -O2 -pipe -fomit-frame-pointer -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -Wno-builtin-macro-redefined -D__DATE__=\"redacted\" -D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\" -fmerge-all-constants -Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor -fmath-errno -fno-exceptions -MD -MT CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.o -MF CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.o.d -o CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.o -c /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_i64d.cc
Using built-in specs.
COLLECT_GCC=/usr/bin/armv7a-unknown-linux-gnueabihf-g++
Target: armv7a-unknown-linux-gnueabihf
Configured with: /var/tmp/portage/cross-armv7a-unknown-linux-gnueabihf/gcc-10.4.0/work/gcc-10.4.0/configure --host=x86_64-pc-linux-gnu --target=armv7a-unknown-linux-gnueabihf --build=x86_64-pc-linux-gnu --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/armv7a-unknown-linux-gnueabihf/gcc-bin/10.4.0 --includedir=/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include --datadir=/usr/share/gcc-data/armv7a-unknown-linux-gnueabihf/10.4.0 --mandir=/usr/share/gcc-data/armv7a-unknown-linux-gnueabihf/10.4.0/man --infodir=/usr/share/gcc-data/armv7a-unknown-linux-gnueabihf/10.4.0/info --with-gxx-include-dir=/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/g++-v10 --with-python-dir=/share/gcc-data/armv7a-unknown-linux-gnueabihf/10.4.0/python --enable-languages=c,c++,fortran --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --disable-libunwind-exceptions --enable-checking=release --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 10.4.0 p5' --disable-esp --enable-libstdcxx-time --disable-libstdcxx-pch --enable-poison-system-directories --with-sysroot=/usr/armv7a-unknown-linux-gnueabihf --disable-bootstrap --enable-__cxa_atexit --enable-clocale=gnu --disable-multilib --disable-fixed-point --with-float=hard --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --enable-libgomp --disable-libssp --disable-libada --disable-cet --disable-systemtap --disable-vtable-verify --disable-libvtv --without-zstd --enable-lto --without-isl --enable-default-pie --enable-default-ssp
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.4.0 (Gentoo 10.4.0 p5)
COLLECT_GCC_OPTIONS='-v' '-D' 'HWY_SHARED_DEFINE' '-D' 'hwy_contrib_EXPORTS' '-I' '/var/tmp/portage/dev-cpp/highway-9999/work/highway-9999' '-O2' '-pipe' '-fomit-frame-pointer' '-fPIC' '-fvisibility=hidden' '-fvisibility-inlines-hidden' '-Wno-builtin-macro-redefined' '-D' '__DATE__="redacted"' '-D' '__TIMESTAMP__="redacted"' '-D' '__TIME__="redacted"' '-fmerge-all-constants' '-Wall' '-Wextra' '-Wconversion' '-Wsign-conversion' '-Wvla' '-Wnon-virtual-dtor' '-fmath-errno' '-fno-exceptions' '-MD' '-MT' 'CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.o' '-MF' 'CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.o.d' '-o' 'CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.o' '-c' '-shared-libgcc' '-mfloat-abi=hard' '-mfpu=vfpv3-d16' '-mtls-dialect=gnu' '-marm' '-mlibarch=armv7-a+fp' '-march=armv7-a+fp'
/usr/libexec/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/cc1plus -quiet -v -I /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999 -MD CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.d -MF CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.o.d -MT CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.o -D_GNU_SOURCE -D HWY_SHARED_DEFINE -D hwy_contrib_EXPORTS -D __DATE__="redacted" -D __TIMESTAMP__="redacted" -D __TIME__="redacted" /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_i64d.cc -quiet -dumpbase vqsort_i64d.cc -mfloat-abi=hard -mfpu=vfpv3-d16 -mtls-dialect=gnu -marm -mlibarch=armv7-a+fp -march=armv7-a+fp -auxbase-strip CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.o -O2 -Wno-builtin-macro-redefined -Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor -version -fomit-frame-pointer -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -fmerge-all-constants -fmath-errno -fno-exceptions -o - |
/usr/libexec/gcc/armv7a-unknown-linux-gnueabihf/as -v -I /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999 -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -meabi=5 -o CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.o
GNU assembler version 2.36.1 (armv7a-unknown-linux-gnueabihf) using BFD version (Gentoo 2.36.1 p5) 2.36.1
Assembler messages:
Fatal error: can't create CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_i64d.cc.o: No such file or directory
GNU C++14 (Gentoo 10.4.0 p5) version 10.4.0 (armv7a-unknown-linux-gnueabihf)
compiled by GNU C version 10.4.0, GMP version 6.2.1, MPFR version 4.1.0-p13, MPC version 1.2.1, isl version none
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/usr/armv7a-unknown-linux-gnueabihf/usr/local/include"
ignoring nonexistent directory "/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/../../../../armv7a-unknown-linux-gnueabihf/include"
#include "..." search starts here:
#include <...> search starts here:
/var/tmp/portage/dev-cpp/highway-9999/work/highway-9999
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/g++-v10
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/g++-v10/armv7a-unknown-linux-gnueabihf
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/g++-v10/backward
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include-fixed
/usr/armv7a-unknown-linux-gnueabihf/usr/include
End of search list.
GNU C++14 (Gentoo 10.4.0 p5) version 10.4.0 (armv7a-unknown-linux-gnueabihf)
compiled by GNU C version 10.4.0, GMP version 6.2.1, MPFR version 4.1.0-p13, MPC version 1.2.1, isl version none
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: d32c7f800b89674769804ef9c6a8ad26
In file included from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/ops/arm_neon-inl.h:29,
from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/highway.h:358,
from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/shared-inl.h:103,
from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/traits-inl.h:27,
from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_i64d.cc:23,
from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/foreach_target.h:81,
from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_i64d.cc:20:
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h: In function 'void hwy::N_NEON::StoreU(hwy::N_NEON::Vec128<long long int, 2>, hwy::N_NEON::Full128<long long int>, int64_t*)':
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:10958:1: error: inlining failed in call to 'always_inline' 'void vst1q_s64(int64_t*, int64x2_t)': target specific option mismatch
10958 | vst1q_s64 (int64_t * __a, int64x2_t __b)
| ^~~~~~~~~
In file included from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/highway.h:358,
from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/shared-inl.h:103,
from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/traits-inl.h:27,
from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_i64d.cc:23,
from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/foreach_target.h:81,
from /var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/contrib/sort/vqsort_i64d.cc:20:
/var/tmp/portage/dev-cpp/highway-9999/work/highway-9999/hwy/ops/arm_neon-inl.h:2765:12: note: called from here
2765 | vst1q_s64(unaligned, v.raw);
| ~~~~~~~~~^~~~~~~~~~~~~~~~~~
frankly I don't see any difference in the error, does the other information tell you something?
@jan-wassenberg Do you believe it makes sense to compile highway with neon support using the default -mfpu=vfpv3-d16 ( generic-armv7-a defaults to vfpv3-d16.) ...
@malaterre, good catch, thanks for pointing to that. vfpv4 is supported since 2009, I'd be surprised if anyone still cares about vfpv3. set_macros-inl.h does:
#if HWY_ARCH_ARM_V7
#define HWY_TARGET_STR "+neon-vfpv4"
It makes sense that the compiler complains because arm_neon.h is compiled with the default target and only for Highway implementation and user code do we set vfpv4.
Here's an idea @stefson : does it help to, in arm_neon-inl.h move the following block to the line after HWY_BEFORE_NAMESPACE();?
HWY_DIAGNOSTICS(push)
HWY_DIAGNOSTICS_OFF(disable : 4701, ignored "-Wuninitialized")
#include <arm_neon.h>
HWY_DIAGNOSTICS(pop)
Can you please post a patch against latest git for your idea? The risk of a missunderstanding is too high if you ask me that way :D
Sure, sent :)
this does not look good :-S
try:
- https://github.com/google/highway/pull/966#issuecomment-1237150317
push force it and ping me any time for results
@stefson can you try 55010e4b126d222acd6906ebdb32f723f94ccafb ?
it seems the compile is fixed by this commit: https://github.com/google/highway/commit/864d97bc74de6681d3e5e382582ddaa2a0837426
patching https://github.com/google/highway/commit/55010e4b126d222acd6906ebdb32f723f94ccafb on top of current git fails:
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:11002:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_u64(uint64_t*, uint64x2_t)’: target specific option mismatch
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:10974:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_f32(float32_t*, float32x4_t)’: target specific option mismatch
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:10944:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_s16(int16_t*, int16x8_t)’: target specific option mismatch
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:10974:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_f32(float32_t*, float32x4_t)’: target specific option mismatch
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:11002:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_u64(uint64_t*, uint64x2_t)’: target specific option mismatch
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:10944:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_s16(int16_t*, int16x8_t)’: target specific option mismatch
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:10951:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_s32(int32_t*, int32x4_t)’: target specific option mismatch
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:10951:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_s32(int32_t*, int32x4_t)’: target specific option mismatch
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:10958:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_s64(int64_t*, int64x2_t)’: target specific option mismatch
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:10958:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_s64(int64_t*, int64x2_t)’: target specific option mismatch
/usr/lib/gcc/armv7a-unknown-linux-gnueabihf/10.4.0/include/arm_neon.h:11002:1: error: inlining failed in call to ‘always_inline’ ‘void vst1q_u64(uint64_t*, uint64x2_t)’: target specific option mismatch
full build log: build.log.gz
for aarch64, current git does fail with many errors ( full log: aarch64-current-git-build.log.gz ) , which is fixed with the proposed patch from https://github.com/google/highway/commit/55010e4b126d222acd6906ebdb32f723f94ccafb
Thank you, then we'll commit that patch shortly :)
yeah, lets watch the fireworks
armv7-gcc still broken with commit https://github.com/google/highway/commit/99340469dd310055f8f269ebe1621c9aaaa79322 , here is the build log: build.log.gz
Thanks for sharing the result. I was unable to reproduce it with GCC 10.3 (godbolt lacks 10.4) and -O2 -march=armv7-a -mfpu=vfpv3-d16, and your -O2 -mfloat-abi=hard -mfpu=vfpv3-d16 -marm -mlibarch=armv7-a+fp -march=armv7-a+fp.
https://gcc.godbolt.org/z/KrYz818xY
can you please name me the gcc versions (gcc-10.3.0 and later) which godbolt offers you? (Edit: I meant versions :D )
You can see them in the dropdown menu in the link above, where it currently says "ARM GCC 10.3.1" :) The next higher one is 11.1.
ah, got it! :D
I can offer you a log of failed compile with gcc-11.3.0, which seems identically to me: gcc-11.3.0-armv7a.log.gz
:) The question is not whether we can get it to fail with other compilers. Instead the problem appears to be the configuration of the compiler, because it works (see godbolt link) with 11.3 and the flags specified there. Have you compiled gcc from source, or is it from a binary release?