Corrfunc Compiling on Apple M1

Is your feature request related to a problem? Please describe. Ability to compile and run Corrfunc on the (late-2020) Apple M1 laptops. Does not compile

Describe the solution you'd like Run Corrfunc on the new M1 laptops (preferably optimised kernels - which needs kernels with Neon ISA)

Describe alternatives you've considered N/A

Additional context The new M1 chip supports ARMv8 (128 bits) instruction set. Most of the codebase was written assuming __DARWIN__ implies x86_64 but that is no longer the case (unsure if the new platform defines __aarch64__ , __arm64__ or both)

@karlglazebrook has very kindly debugged the install from a source tarball (pip install failed). The current error occurs when compiling cpu_features.c

/usr/local/bin/gcc  -DVERSION=\"2.3.4\" -DUSE_UNICODE -std=c99 -g -Wsign-compare -Wall -Wextra -Wshadow -Wunused -fPIC -D_POSIX_SOURCE=200809L -D_GNU_SOURCE -D_DARWIN_C_SOURCE -O3  -ftree-vectorize -funroll-loops -fprefetch-loop-arrays --param simultaneous-prefetches=4  -Wa,-q -fopenmp -funroll-loops -march=native -fno-strict-aliasing -Wformat=2  -Wpacked  -Wnested-externs -Wpointer-arith  -Wredundant-decls  -Wfloat-equal -Wcast-qual -Wcast-align -Wmissing-declarations -Wmissing-prototypes  -Wnested-externs -Wstrict-prototypes   -Wno-unused-local-typedefs  -I../../io -I../../utils  -c ../../utils/cpu_features.c -o ../../utils/cpu_features.o
In file included from ../../utils/cpu_features.c:13:
../../utils/cpu_features.c: In function ‘runtime_instrset_detect’:
../../utils/cpu_features.h:41:4:error: impossible constraint in ‘asm’   
41 |    __asm("cpuid" : "=a"(a),"=b"(b),"=c"(c),"=d"(d) : "a"(functionnumber),"c"(0) );      
                                                                |    ^~~~~../../utils/cpu_features.h:41:4:error: impossible constraint in ‘asm’  
41 |    __asm("cpuid" : "=a"(a),"=b"(b),"=c"(c),"=d"(d) : "a"(functionnumber),"c"(0) );

One solution could be to protect this line :

#if defined(__GNUC__) || defined(__clang__)              // use inline assembly, Gnu/AT&T syntax

to something like

#if defined(__GNUC__) || defined(__clang__) and !defined(__arm64__)             // use inline assembly, Gnu/AT&T syntax

Jan 15 '21 05:01 manodeep

I've tried making that line both explicitly true and explicitly false. If I set it to '#if 0' I get a different error:

../../utils/cpu_features.h:49:11: error: expected ‘(’ before ‘{’ token
   49 |     __asm {
      |           ^
      |           (
../../utils/cpu_features.h:50:9: error: unknown type name ‘mov’
   50 |         mov eax, functionnumber
      |         ^~~
../../utils/cpu_features.h:51:9: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘xor’
   51 |         xor ecx, ecx
      |         ^~~
../../utils/cpu_features.h:53:9: error: unknown type name ‘mov’
   53 |         mov esi, output
      |         ^~~
../../utils/cpu_features.h:54:9: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘mov’
   54 |         mov [esi],    eax
      |         ^~~
../../utils/cpu_features.h:53:13: warning: unused variable ‘esi’ [-Wunused-variable]
   53 |         mov esi, output
      |             ^~~
../../utils/cpu_features.h:50:13: warning: unused variable ‘eax’ [-Wunused-variable]
   50 |         mov eax, functionnumber
      |             ^~~
../../utils/cpu_features.h:64:23: error: invalid storage class for function ‘xgetbv’
   64 | static inline int64_t xgetbv (int ctr) {
      |                       ^~~~~~
../../utils/cpu_features.h:86:12: warning: nested extern declaration of ‘runtime_instrset_detect’ [-Wnested-externs]
   86 | extern int runtime_instrset_detect(void);
      |            ^~~~~~~~~~~~~~~~~~~~~~~
../../utils/cpu_features.h:87:12: warning: nested extern declaration of ‘get_max_usable_isa’ [-Wnested-externs]
   87 | extern int get_max_usable_isa(void);

Is it just wrong C syntax that has not been seen due to not being invoked in many years?

Jan 15 '21 05:01 karlglazebrook

@karlglazebrook Huh - now this is erroring in the next function. I wonder if the compilation works for only these lines?

Jan 15 '21 06:01 manodeep

It is also erroring at line 49 which is part of the first #ifdef #else branch. Yes it is also erroring in the next lot but one thing at a time?

For lines 38..60 The #else branch it seems to me is clearly intel only from the comments, however the first branch gives the error initially reported. It says it is ' inline assembly, Gnu/AT&T syntax'. I am guessing the problem here is that neither type of assembly is correct for ARM.

Jan 15 '21 07:01 karlglazebrook

Ahh yes - thanks for clarifying! I got thrown off by the line numbers. You are quite right - the assembly syntax is different for ARM. Let me try to come up with a solution...

Documenting what I have found so far. One solution that works for clang, is to add the following line everywhere:

#if defined (__ARM_NEON__)
    return 0/FALLBACK  /* only compiles the "FALLBACK" kernels */
#elif ...

Looks like these registers might have the appropriate values. Relevant SO entry. ARM docs for writing inline assembly

Jan 15 '21 19:01 manodeep

@karlglazebrook Do you mind copy-pasting the output of:

#!/bin/bash                                                                                                                                                            
declare -a compilers=("/usr/bin/clang" "/usr/local/bin/gcc -fopenmp")
for cc in "${compilers[@]}"
do
    echo "*** $cc ***"
    $cc -std=c99 -march=native -O3 -dM -E - < /dev/null
    echo "*** $cc done ***"
done

This will give a hint as to what compiler flags are being defined for the OS + instruction set.

Jan 15 '21 22:01 manodeep

Sure, noting I had to change the line to

eval "$cc -std=c99 -march=native -O3 -dM -E - < /dev/null"

otherwise I got the error

test.sh:6: no such file or directory: /usr/local/bin/gcc -fopenmp

as is

*** /usr/bin/clang *** clang: error: the clang compiler does not support '-march=native' *** /usr/bin/clang done *** *** /usr/local/bin/gcc -fopenmp *** #define DBL_MIN_EXP (-1021) #define UINT_LEAST16_MAX 0xffff #define __ARM_SIZEOF_WCHAR_T 4 #define DBL_DECIMAL_DIG 17 #define __ATOMIC_ACQUIRE 2 #define FLT_MIN 1.1754943508222875e-38F #define __GCC_IEC_559_COMPLEX 2 #define UINT_LEAST8_TYPE unsigned char #define __INTMAX_C(c) c ## L #define UINT8_MAX 0xff #define WCHAR_MAX 0x7fffffff #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1 #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1 #define __GCC_ATOMIC_CHAR_LOCK_FREE 2 #define __GCC_IEC_559 2 #define FLT32X_DECIMAL_DIG 17 #define FLT_EVAL_METHOD 0 #define FLT64_DECIMAL_DIG 17 #define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2 #define UINT_FAST32_TYPE unsigned int #define UINT_FAST64_MAX 0xffffffffffffffffULL #define DBL_MIN_10_EXP (-307) #define FINITE_MATH_ONLY 0 #define FLT32X_MAX_EXP 1024 #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1 #define GNUC_PATCHLEVEL 0 #define FLT32_HAS_DENORM 1 #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1 #define UINT_FAST8_MAX 0xff #define __INT8_C(c) c #define __ARM_64BIT_STATE 1 #define INT_LEAST8_WIDTH 8 #define INTMAX_TYPE long int #define UINT_LEAST64_MAX 0xffffffffffffffffULL #define SHRT_MAX 0x7fff #define LDBL_MAX 1.7976931348623157e+308L #define __ARM_FEATURE_IDIV 1 #define LDBL_IS_IEC_60559 2 #define __ARM_FP 14 #define DYNAMIC 1 #define UINT_LEAST8_MAX 0xff #define APPLE_CC 1 #define UINTMAX_TYPE long unsigned int #define FLT_EVAL_METHOD_TS_18661_3 0 #define UINT32_MAX 0xffffffffU #define DBL_DENORM_MIN ((double)4.9406564584124654e-324L) #define AARCH64_CMODEL_SMALL 1 #define LDBL_MAX_EXP 1024 #define CHAR_BIT 8 #define FLT32X_IS_IEC_60559 2 #define INT_LEAST16_WIDTH 16 #define __ARM_ALIGN_MAX_STACK_PWR 16 #define SCHAR_MAX 0x7f #define DBL_MAX ((double)1.7976931348623157e+308L) #define WCHAR_MIN (-WCHAR_MAX - 1) #define __INT64_C(c) c ## LL #define __GCC_ATOMIC_POINTER_LOCK_FREE 2 #define SIZEOF_INT 4 #define INT_FAST64_WIDTH 64 #define __PRAGMA_REDEFINE_EXTNAME 1 #define FLT32X_MANT_DIG 53 #define USER_LABEL_PREFIX _ #define FLT32_MAX_10_EXP 38 #define STDC_HOSTED 1 #define DBL_DIG 15 #define FLT32_DIG 6 #define FLT_EPSILON 1.1920928955078125e-7F #define SHRT_WIDTH 16 #define FLT32_IS_IEC_60559 2 #define LDBL_MIN 2.2250738585072014e-308L #define WINT_TYPE int #define FLT16_HAS_QUIET_NAN 1 #define __strong #define __ARM_SIZEOF_MINIMAL_ENUM 4 #define __FP_FAST_FMA 1 #define FLT32X_HAS_INFINITY 1 #define INT32_MAX 0x7fffffff #define INT_WIDTH 32 #define SIZEOF_LONG 8 #define APPLE 1 #define __UINT16_C(c) c #define DECIMAL_DIG 17 #define FLT64_EPSILON 2.2204460492503131e-16F64 #define INT16_MAX 0x7fff #define LDBL_HAS_QUIET_NAN 1 #define FLT16_MIN_EXP (-13) #define FLT64_MANT_DIG 53 #define LDBL_MANT_DIG 53 #define GNUC 11 #define FLT_HAS_DENORM 1 #define SIZEOF_LONG_DOUBLE 8 #define LDBL_MIN_EXP (-1021) #define FLT64_MAX_10_EXP 308 #define FLT16_MAX_10_EXP 4 #define DBL_IS_IEC_60559 2 #define FLT32_HAS_INFINITY 1 #define LDBL_HAS_DENORM 1 #define DBL_HAS_INFINITY 1 #define __HAVE_SPECULATION_SAFE_VALUE 1 #define INTPTR_WIDTH 64 #define FLT32X_HAS_DENORM 1 #define INT_FAST16_TYPE short int #define STRICT_ANSI 1 #define FLT32_DECIMAL_DIG 9 #define INT_LEAST32_MAX 0x7fffffff #define __weak #define DBL_MAX_EXP 1024 #define WCHAR_WIDTH 32 #define FLT32_MAX 3.4028234663852886e+38F32 #define __GCC_ATOMIC_LONG_LOCK_FREE 2 #define FLT16_DECIMAL_DIG 5 #define FLT_IS_IEC_60559 2 #define FLT32_HAS_QUIET_NAN 1 #define LONG_LONG_MAX 0x7fffffffffffffffLL #define SIZEOF_SIZE_T 8 #define SIG_ATOMIC_WIDTH 32 #define __ARM_ALIGN_MAX_PWR 28 #define SIZEOF_WINT_T 4 #define LONG_LONG_WIDTH 64 #define FLT32_MAX_EXP 128 #define __ARM_FP16_FORMAT_IEEE 1 #define FLT_MIN_EXP (-125) #define FLT64_NORM_MAX 1.7976931348623157e+308F64 #define FLT32X_MIN_EXP (-1021) #define INT_FAST64_TYPE long long int #define __ARM_FP16_ARGS 1 #define __FP_FAST_FMAF 1 #define __FP_FAST_FMAL 1 #define FLT64_DENORM_MIN 4.9406564584124654e-324F64 #define DBL_MIN ((double)2.2250738585072014e-308L) #define __ARM_FEATURE_CLZ 1 #define FLT16_DENORM_MIN 5.9604644775390625e-8F16 #define SIZEOF_POINTER 8 #define __GXX_ABI_VERSION 1015 #define SIZE_TYPE long unsigned int #define LP64 1 #define DBL_HAS_QUIET_NAN 1 #define FLT_EVAL_METHOD_C99 0 #define FLT32X_EPSILON 2.2204460492503131e-16F32x #define FLT64_MIN_EXP (-1021) #define UINT64_MAX 0xffffffffffffffffULL #define LDBL_DECIMAL_DIG 17 #define FLT_MAX 3.4028234663852886e+38F #define aarch64 1 #define FLT64_MIN_10_EXP (-307) #define REGISTER_PREFIX #define UINT16_MAX 0xffff #define LDBL_HAS_INFINITY 1 #define FLT_DIG 6 #define DEC_EVAL_METHOD 2 #define FLT_MANT_DIG 24 #define FLT16_MIN_10_EXP (-4) #define VERSION "11.0.0 20201128 (experimental)" #define __UINT64_C(c) c ## ULL #define WINT_MAX 0x7fffffff #define __GCC_ATOMIC_INT_LOCK_FREE 2 #define FLT32X_MIN 2.2250738585072014e-308F32x #define FLT32_MANT_DIG 24 #define AARCH64EL 1 #define FLOAT_WORD_ORDER ORDER_LITTLE_ENDIAN #define FLT16_MAX_EXP 16 #define BIGGEST_ALIGNMENT 16 #define __INT32_C(c) c #define FLT16_DIG 3 #define SCHAR_WIDTH 8 #define ORDER_PDP_ENDIAN 3412 #define INT_FAST32_TYPE int #define UINT_LEAST16_TYPE short unsigned int #define ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED 110000 #define __ARM_FEATURE_FMA 1 #define INT8_TYPE signed char #define SIG_ATOMIC_TYPE int #define GCC_ASM_FLAG_OUTPUTS 1 #define arm64 1 #define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1 #define FLT_RADIX 2 #define INT_LEAST16_TYPE short int #define __ARM_ARCH_PROFILE 65 #define LDBL_EPSILON 2.2204460492503131e-16L #define __UINTMAX_C(c) c ## UL #define __ARM_PCS_AAPCS64 1 #define SIG_ATOMIC_MAX 0x7fffffff #define OPTIMIZE 1 #define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2 #define SIZEOF_PTRDIFF_T 8 #define __arm64 1 #define __ATOMIC_RELAXED 0 #define INT_FAST32_WIDTH 32 #define LDBL_DIG 15 #define FLT64_IS_IEC_60559 2 #define FLT16_IS_IEC_60559 2 #define FLT64_DIG 15 #define UINT_FAST32_MAX 0xffffffffU #define UINT_LEAST64_TYPE long long unsigned int #define FLT16_EPSILON 9.7656250000000000e-4F16 #define FLT_HAS_QUIET_NAN 1 #define FLT_MAX_10_EXP 38 #define LONG_MAX 0x7fffffffffffffffL #define FLT_HAS_INFINITY 1 #define DBL_HAS_DENORM 1 #define UINT_FAST16_TYPE short unsigned int #define FLT32X_HAS_QUIET_NAN 1 #define CHAR16_TYPE short unsigned int #define SIZE_WIDTH 64 #define INTMAX_WIDTH 64 #define INT_LEAST16_MAX 0x7fff #define FLT16_NORM_MAX 6.5504000000000000e+4F16 #define INT64_MAX 0x7fffffffffffffffLL #define FLT32_DENORM_MIN 1.4012984643248171e-45F32 #define INT_LEAST64_TYPE long long int #define INT16_TYPE short int #define INT_LEAST8_TYPE signed char #define FLT16_MAX 6.5504000000000000e+4F16 #define STDC_VERSION 199901L #define INT_FAST8_MAX 0x7f #define __ARM_ARCH 8 #define INTPTR_MAX 0x7fffffffffffffffL #define __ARM_FEATURE_UNALIGNED 1 #define FLT64_HAS_QUIET_NAN 1 #define FLT32X_DIG 15 #define UINT8_TYPE unsigned char #define PTRDIFF_WIDTH 64 #define CONSTANT_CFSTRINGS 1 #define FLT64_HAS_INFINITY 1 #define FLT16_HAS_INFINITY 1 #define SIG_ATOMIC_MIN (-SIG_ATOMIC_MAX - 1) #define PTRDIFF_MAX 0x7fffffffffffffffL #define FLT16_MANT_DIG 11 #define INTPTR_TYPE long int #define UINT16_TYPE short unsigned int #define WCHAR_TYPE int #define pic 2 #define UINTPTR_MAX 0xffffffffffffffffUL #define __ARM_ARCH_8A 1 #define INT_FAST64_MAX 0x7fffffffffffffffLL #define FLT_NORM_MAX 3.4028234663852886e+38F #define UINT_FAST64_TYPE long long unsigned int #define INT_MAX 0x7fffffff #define INT64_TYPE long long int #define FLT_MAX_EXP 128 #define ORDER_BIG_ENDIAN 4321 #define DBL_MANT_DIG 53 #define INT_LEAST64_MAX 0x7fffffffffffffffLL #define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2 #define __FP_FAST_FMAF32 1 #define UINT_LEAST32_TYPE unsigned int #define SIZEOF_SHORT 2 #define FLT32_NORM_MAX 3.4028234663852886e+38F32 #define __GCC_ATOMIC_BOOL_LOCK_FREE 2 #define FLT64_MAX 1.7976931348623157e+308F64 #define MACH 1 #define LITTLE_ENDIAN 1 #define WINT_WIDTH 32 #define __FP_FAST_FMAF64 1 #define INT_LEAST8_MAX 0x7f #define INT_LEAST64_WIDTH 64 #define FLT32X_MAX_10_EXP 308 #define INT_FAST16_MAX 0x7fff #define SIZEOF_INT128 16 #define FLT16_MIN 6.1035156250000000e-5F16 #define LDBL_MAX_10_EXP 308 #define DBL_EPSILON ((double)2.2204460492503131e-16L) #define FLT32_MIN_EXP (-125) #define _LP64 1 #define __UINT8_C(c) c #define FLT64_MAX_EXP 1024 #define INT_LEAST32_TYPE int #define UINT64_TYPE long long unsigned int #define __ARM_NEON 1 #define INT_FAST32_MAX 0x7fffffff #define INTMAX_MAX 0x7fffffffffffffffL #define UINT_FAST8_TYPE unsigned char #define INT_FAST8_TYPE signed char #define GNUC_STDC_INLINE 1 #define FLT64_HAS_DENORM 1 #define _OPENMP 201511 #define FLT32_EPSILON 1.1920928955078125e-7F32 #define __FP_FAST_FMAF32x 1 #define FLT16_HAS_DENORM 1 #define INT_FAST8_WIDTH 8 #define FLT32X_MAX 1.7976931348623157e+308F32x #define DBL_NORM_MAX ((double)1.7976931348623157e+308L) #define BYTE_ORDER ORDER_LITTLE_ENDIAN #define LDBL_DENORM_MIN 4.9406564584124654e-324L #define SIZEOF_WCHAR_T 4 #define __UINT32_C(c) c ## U #define FLT_DENORM_MIN 1.4012984643248171e-45F #define WINT_MIN (-WINT_MAX - 1) #define INT8_MAX 0x7f #define LONG_WIDTH 64 #define PIC 2 #define FLT32X_NORM_MAX 1.7976931348623157e+308F32x #define CHAR32_TYPE unsigned int #define FLT32_MIN_10_EXP (-37) #define __ARM_FEATURE_NUMERIC_MAXMIN 1 #define INT32_TYPE int #define SIZEOF_DOUBLE 8 #define FLT_MIN_10_EXP (-37) #define FLT64_MIN 2.2250738585072014e-308F64 #define INT_LEAST32_WIDTH 32 #define SIZEOF_FLOAT 4 #define __ATOMIC_CONSUME 1 #define GNUC_MINOR 0 #define INT_FAST16_WIDTH 16 #define UINTMAX_MAX 0xffffffffffffffffUL #define FLT32X_DENORM_MIN 4.9406564584124654e-324F32x #define DBL_MAX_10_EXP 308 #define __INT16_C(c) c #define __ARM_ARCH_ISA_A64 1 #define STDC 1 #define PTRDIFF_TYPE long int #define FLT32_MIN 1.1754943508222875e-38F32 #define __ATOMIC_SEQ_CST 5 #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 1 #define UINT32_TYPE unsigned int #define FLT32X_MIN_10_EXP (-307) #define UINTPTR_TYPE long unsigned int #define LDBL_MIN_10_EXP (-307) #define SIZEOF_LONG_LONG 8 #define __GCC_ATOMIC_LLONG_LOCK_FREE 2 #define FLT_DECIMAL_DIG 9 #define UINT_FAST16_MAX 0xffff #define LDBL_NORM_MAX 1.7976931348623157e+308L #define __GCC_ATOMIC_SHORT_LOCK_FREE 2 #define ORDER_LITTLE_ENDIAN 1234 #define SIZE_MAX 0xffffffffffffffffUL #define UINT_LEAST32_MAX 0xffffffffU #define __ATOMIC_ACQ_REL 4 #define __ATOMIC_RELEASE 3 *** /usr/local/bin/gcc -fopenmp done ***

removing -march=native :

*** /usr/bin/clang *** #define _LP64 1 #define AARCH64EL 1 #define AARCH64_SIMD 1 #define APPLE_CC 6000 #define APPLE 1 #define ARM64_ARCH_8 1 #define __ARM_64BIT_STATE 1 #define __ARM_ACLE 200 #define __ARM_ALIGN_MAX_STACK_PWR 4 #define __ARM_ARCH 8 #define ARM_ARCH_8_3 1 #define __ARM_ARCH_ISA_A64 1 #define __ARM_ARCH_PROFILE 'A' #define __ARM_FEATURE_CLZ 1 #define __ARM_FEATURE_COMPLEX 1 #define __ARM_FEATURE_CRC32 1 #define __ARM_FEATURE_CRYPTO 1 #define __ARM_FEATURE_DIRECTED_ROUNDING 1 #define __ARM_FEATURE_DIV 1 #define __ARM_FEATURE_FMA 1 #define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1 #define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1 #define __ARM_FEATURE_IDIV 1 #define __ARM_FEATURE_JCVT 1 #define __ARM_FEATURE_LDREX 0xF #define __ARM_FEATURE_NUMERIC_MAXMIN 1 #define __ARM_FEATURE_QRDMX 1 #define __ARM_FEATURE_UNALIGNED 1 #define __ARM_FP 0xE #define __ARM_FP16_ARGS 1 #define __ARM_FP16_FORMAT_IEEE 1 #define __ARM_NEON 1 #define __ARM_NEON_FP 0xE #define ARM_NEON 1 #define __ARM_PCS_AAPCS64 1 #define __ARM_SIZEOF_MINIMAL_ENUM 4 #define __ARM_SIZEOF_WCHAR_T 4 #define __ATOMIC_ACQUIRE 2 #define __ATOMIC_ACQ_REL 4 #define __ATOMIC_CONSUME 1 #define __ATOMIC_RELAXED 0 #define __ATOMIC_RELEASE 3 #define __ATOMIC_SEQ_CST 5 #define BIGGEST_ALIGNMENT 8 #define BLOCKS 1 #define BYTE_ORDER ORDER_LITTLE_ENDIAN #define CHAR16_TYPE unsigned short #define CHAR32_TYPE unsigned int #define CHAR_BIT 8 #define __CLANG_ATOMIC_BOOL_LOCK_FREE 2 #define __CLANG_ATOMIC_CHAR16_T_LOCK_FREE 2 #define __CLANG_ATOMIC_CHAR32_T_LOCK_FREE 2 #define __CLANG_ATOMIC_CHAR_LOCK_FREE 2 #define __CLANG_ATOMIC_INT_LOCK_FREE 2 #define __CLANG_ATOMIC_LLONG_LOCK_FREE 2 #define __CLANG_ATOMIC_LONG_LOCK_FREE 2 #define __CLANG_ATOMIC_POINTER_LOCK_FREE 2 #define __CLANG_ATOMIC_SHORT_LOCK_FREE 2 #define __CLANG_ATOMIC_WCHAR_T_LOCK_FREE 2 #define CONSTANT_CFSTRINGS 1 #define DBL_DECIMAL_DIG 17 #define DBL_DENORM_MIN 4.9406564584124654e-324 #define DBL_DIG 15 #define DBL_EPSILON 2.2204460492503131e-16 #define DBL_HAS_DENORM 1 #define DBL_HAS_INFINITY 1 #define DBL_HAS_QUIET_NAN 1 #define DBL_MANT_DIG 53 #define DBL_MAX_10_EXP 308 #define DBL_MAX_EXP 1024 #define DBL_MAX 1.7976931348623157e+308 #define DBL_MIN_10_EXP (-307) #define DBL_MIN_EXP (-1021) #define DBL_MIN 2.2250738585072014e-308 #define DECIMAL_DIG LDBL_DECIMAL_DIG #define DYNAMIC 1 #define ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED 110000 #define FINITE_MATH_ONLY 0 #define FLT16_DECIMAL_DIG 5 #define FLT16_DENORM_MIN 5.9604644775390625e-8F16 #define FLT16_DIG 3 #define FLT16_EPSILON 9.765625e-4F16 #define FLT16_HAS_DENORM 1 #define FLT16_HAS_INFINITY 1 #define FLT16_HAS_QUIET_NAN 1 #define FLT16_MANT_DIG 11 #define FLT16_MAX_10_EXP 4 #define FLT16_MAX_EXP 16 #define FLT16_MAX 6.5504e+4F16 #define FLT16_MIN_10_EXP (-4) #define FLT16_MIN_EXP (-13) #define FLT16_MIN 6.103515625e-5F16 #define FLT_DECIMAL_DIG 9 #define FLT_DENORM_MIN 1.40129846e-45F #define FLT_DIG 6 #define FLT_EPSILON 1.19209290e-7F #define FLT_EVAL_METHOD 0 #define FLT_HAS_DENORM 1 #define FLT_HAS_INFINITY 1 #define FLT_HAS_QUIET_NAN 1 #define FLT_MANT_DIG 24 #define FLT_MAX_10_EXP 38 #define FLT_MAX_EXP 128 #define FLT_MAX 3.40282347e+38F #define FLT_MIN_10_EXP (-37) #define FLT_MIN_EXP (-125) #define FLT_MIN 1.17549435e-38F #define FLT_RADIX 2 #define __GCC_ATOMIC_BOOL_LOCK_FREE 2 #define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2 #define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2 #define __GCC_ATOMIC_CHAR_LOCK_FREE 2 #define __GCC_ATOMIC_INT_LOCK_FREE 2 #define __GCC_ATOMIC_LLONG_LOCK_FREE 2 #define __GCC_ATOMIC_LONG_LOCK_FREE 2 #define __GCC_ATOMIC_POINTER_LOCK_FREE 2 #define __GCC_ATOMIC_SHORT_LOCK_FREE 2 #define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1 #define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2 #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1 #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1 #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1 #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1 #define GNUC_MINOR 2 #define GNUC_PATCHLEVEL 1 #define GNUC_STDC_INLINE 1 #define GNUC 4 #define __GXX_ABI_VERSION 1002 #define INT16_C_SUFFIX #define INT16_FMTd "hd" #define INT16_FMTi "hi" #define INT16_MAX 32767 #define INT16_TYPE short #define INT32_C_SUFFIX #define INT32_FMTd "d" #define INT32_FMTi "i" #define INT32_MAX 2147483647 #define INT32_TYPE int #define INT64_C_SUFFIX LL #define INT64_FMTd "lld" #define INT64_FMTi "lli" #define INT64_MAX 9223372036854775807LL #define INT64_TYPE long long int #define INT8_C_SUFFIX #define INT8_FMTd "hhd" #define INT8_FMTi "hhi" #define INT8_MAX 127 #define INT8_TYPE signed char #define INTMAX_C_SUFFIX L #define INTMAX_FMTd "ld" #define INTMAX_FMTi "li" #define INTMAX_MAX 9223372036854775807L #define INTMAX_TYPE long int #define INTMAX_WIDTH 64 #define INTPTR_FMTd "ld" #define INTPTR_FMTi "li" #define INTPTR_MAX 9223372036854775807L #define INTPTR_TYPE long int #define INTPTR_WIDTH 64 #define INT_FAST16_FMTd "hd" #define INT_FAST16_FMTi "hi" #define INT_FAST16_MAX 32767 #define INT_FAST16_TYPE short #define INT_FAST32_FMTd "d" #define INT_FAST32_FMTi "i" #define INT_FAST32_MAX 2147483647 #define INT_FAST32_TYPE int #define INT_FAST64_FMTd "lld" #define INT_FAST64_FMTi "lli" #define INT_FAST64_MAX 9223372036854775807LL #define INT_FAST64_TYPE long long int #define INT_FAST8_FMTd "hhd" #define INT_FAST8_FMTi "hhi" #define INT_FAST8_MAX 127 #define INT_FAST8_TYPE signed char #define INT_LEAST16_FMTd "hd" #define INT_LEAST16_FMTi "hi" #define INT_LEAST16_MAX 32767 #define INT_LEAST16_TYPE short #define INT_LEAST32_FMTd "d" #define INT_LEAST32_FMTi "i" #define INT_LEAST32_MAX 2147483647 #define INT_LEAST32_TYPE int #define INT_LEAST64_FMTd "lld" #define INT_LEAST64_FMTi "lli" #define INT_LEAST64_MAX 9223372036854775807LL #define INT_LEAST64_TYPE long long int #define INT_LEAST8_FMTd "hhd" #define INT_LEAST8_FMTi "hhi" #define INT_LEAST8_MAX 127 #define INT_LEAST8_TYPE signed char #define INT_MAX 2147483647 #define LDBL_DECIMAL_DIG 17 #define LDBL_DENORM_MIN 4.9406564584124654e-324L #define LDBL_DIG 15 #define LDBL_EPSILON 2.2204460492503131e-16L #define LDBL_HAS_DENORM 1 #define LDBL_HAS_INFINITY 1 #define LDBL_HAS_QUIET_NAN 1 #define LDBL_MANT_DIG 53 #define LDBL_MAX_10_EXP 308 #define LDBL_MAX_EXP 1024 #define LDBL_MAX 1.7976931348623157e+308L #define LDBL_MIN_10_EXP (-307) #define LDBL_MIN_EXP (-1021) #define LDBL_MIN 2.2250738585072014e-308L #define LITTLE_ENDIAN 1 #define LONG_LONG_MAX 9223372036854775807LL #define LONG_MAX 9223372036854775807L #define LP64 1 #define MACH 1 #define __OBJC_BOOL_IS_BOOL 1 #define __OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES 3 #define __OPENCL_MEMORY_SCOPE_DEVICE 2 #define __OPENCL_MEMORY_SCOPE_SUB_GROUP 4 #define __OPENCL_MEMORY_SCOPE_WORK_GROUP 1 #define __OPENCL_MEMORY_SCOPE_WORK_ITEM 0 #define OPTIMIZE 1 #define ORDER_BIG_ENDIAN 4321 #define ORDER_LITTLE_ENDIAN 1234 #define ORDER_PDP_ENDIAN 3412 #define PIC 2 #define POINTER_WIDTH 64 #define __PRAGMA_REDEFINE_EXTNAME 1 #define PTRDIFF_FMTd "ld" #define PTRDIFF_FMTi "li" #define PTRDIFF_MAX 9223372036854775807L #define PTRDIFF_TYPE long int #define PTRDIFF_WIDTH 64 #define REGISTER_PREFIX #define SCHAR_MAX 127 #define SHRT_MAX 32767 #define SIG_ATOMIC_MAX 2147483647 #define SIG_ATOMIC_WIDTH 32 #define SIZEOF_DOUBLE 8 #define SIZEOF_FLOAT 4 #define SIZEOF_INT128 16 #define SIZEOF_INT 4 #define SIZEOF_LONG_DOUBLE 8 #define SIZEOF_LONG_LONG 8 #define SIZEOF_LONG 8 #define SIZEOF_POINTER 8 #define SIZEOF_PTRDIFF_T 8 #define SIZEOF_SHORT 2 #define SIZEOF_SIZE_T 8 #define SIZEOF_WCHAR_T 4 #define SIZEOF_WINT_T 4 #define SIZE_FMTX "lX" #define SIZE_FMTo "lo" #define SIZE_FMTu "lu" #define SIZE_FMTx "lx" #define SIZE_MAX 18446744073709551615UL #define SIZE_TYPE long unsigned int #define SIZE_WIDTH 64 #define SSP 1 #define STDC_HOSTED 1 #define STDC_NO_THREADS 1 #define STDC_UTF_16 1 #define STDC_UTF_32 1 #define STDC_VERSION 199901L #define STDC 1 #define STRICT_ANSI 1 #define UINT16_C_SUFFIX #define UINT16_FMTX "hX" #define UINT16_FMTo "ho" #define UINT16_FMTu "hu" #define UINT16_FMTx "hx" #define UINT16_MAX 65535 #define UINT16_TYPE unsigned short #define UINT32_C_SUFFIX U #define UINT32_FMTX "X" #define UINT32_FMTo "o" #define UINT32_FMTu "u" #define UINT32_FMTx "x" #define UINT32_MAX 4294967295U #define UINT32_TYPE unsigned int #define UINT64_C_SUFFIX ULL #define UINT64_FMTX "llX" #define UINT64_FMTo "llo" #define UINT64_FMTu "llu" #define UINT64_FMTx "llx" #define UINT64_MAX 18446744073709551615ULL #define UINT64_TYPE long long unsigned int #define UINT8_C_SUFFIX #define UINT8_FMTX "hhX" #define UINT8_FMTo "hho" #define UINT8_FMTu "hhu" #define UINT8_FMTx "hhx" #define UINT8_MAX 255 #define UINT8_TYPE unsigned char #define UINTMAX_C_SUFFIX UL #define UINTMAX_FMTX "lX" #define UINTMAX_FMTo "lo" #define UINTMAX_FMTu "lu" #define UINTMAX_FMTx "lx" #define UINTMAX_MAX 18446744073709551615UL #define UINTMAX_TYPE long unsigned int #define UINTMAX_WIDTH 64 #define UINTPTR_FMTX "lX" #define UINTPTR_FMTo "lo" #define UINTPTR_FMTu "lu" #define UINTPTR_FMTx "lx" #define UINTPTR_MAX 18446744073709551615UL #define UINTPTR_TYPE long unsigned int #define UINTPTR_WIDTH 64 #define UINT_FAST16_FMTX "hX" #define UINT_FAST16_FMTo "ho" #define UINT_FAST16_FMTu "hu" #define UINT_FAST16_FMTx "hx" #define UINT_FAST16_MAX 65535 #define UINT_FAST16_TYPE unsigned short #define UINT_FAST32_FMTX "X" #define UINT_FAST32_FMTo "o" #define UINT_FAST32_FMTu "u" #define UINT_FAST32_FMTx "x" #define UINT_FAST32_MAX 4294967295U #define UINT_FAST32_TYPE unsigned int #define UINT_FAST64_FMTX "llX" #define UINT_FAST64_FMTo "llo" #define UINT_FAST64_FMTu "llu" #define UINT_FAST64_FMTx "llx" #define UINT_FAST64_MAX 18446744073709551615ULL #define UINT_FAST64_TYPE long long unsigned int #define UINT_FAST8_FMTX "hhX" #define UINT_FAST8_FMTo "hho" #define UINT_FAST8_FMTu "hhu" #define UINT_FAST8_FMTx "hhx" #define UINT_FAST8_MAX 255 #define UINT_FAST8_TYPE unsigned char #define UINT_LEAST16_FMTX "hX" #define UINT_LEAST16_FMTo "ho" #define UINT_LEAST16_FMTu "hu" #define UINT_LEAST16_FMTx "hx" #define UINT_LEAST16_MAX 65535 #define UINT_LEAST16_TYPE unsigned short #define UINT_LEAST32_FMTX "X" #define UINT_LEAST32_FMTo "o" #define UINT_LEAST32_FMTu "u" #define UINT_LEAST32_FMTx "x" #define UINT_LEAST32_MAX 4294967295U #define UINT_LEAST32_TYPE unsigned int #define UINT_LEAST64_FMTX "llX" #define UINT_LEAST64_FMTo "llo" #define UINT_LEAST64_FMTu "llu" #define UINT_LEAST64_FMTx "llx" #define UINT_LEAST64_MAX 18446744073709551615ULL #define UINT_LEAST64_TYPE long long unsigned int #define UINT_LEAST8_FMTX "hhX" #define UINT_LEAST8_FMTo "hho" #define UINT_LEAST8_FMTu "hhu" #define UINT_LEAST8_FMTx "hhx" #define UINT_LEAST8_MAX 255 #define UINT_LEAST8_TYPE unsigned char #define USER_LABEL_PREFIX _ #define VERSION "Apple LLVM 12.0.0 (clang-1200.0.32.28)" #define WCHAR_MAX 2147483647 #define WCHAR_TYPE int #define WCHAR_WIDTH 32 #define WINT_MAX 2147483647 #define WINT_TYPE int #define WINT_WIDTH 32 #define aarch64 1 #define apple_build_version 12000032 #define __arm64 1 #define arm64 1 #define __block attribute((blocks(byref))) #define clang 1 #define clang_major 12 #define clang_minor 0 #define clang_patchlevel 0 #define clang_version "12.0.0 (clang-1200.0.32.28)" #define llvm 1 #define __nonnull _Nonnull #define __null_unspecified _Null_unspecified #define __nullable _Nullable #define pic 2 #define __strong #define __unsafe_unretained #define __weak attribute((objc_gc(weak))) *** /usr/bin/clang done *** *** /usr/local/bin/gcc -fopenmp *** #define DBL_MIN_EXP (-1021) #define UINT_LEAST16_MAX 0xffff #define __ARM_SIZEOF_WCHAR_T 4 #define DBL_DECIMAL_DIG 17 #define __ATOMIC_ACQUIRE 2 #define FLT_MIN 1.1754943508222875e-38F #define __GCC_IEC_559_COMPLEX 2 #define UINT_LEAST8_TYPE unsigned char #define __INTMAX_C(c) c ## L #define UINT8_MAX 0xff #define WCHAR_MAX 0x7fffffff #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1 #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1 #define __GCC_ATOMIC_CHAR_LOCK_FREE 2 #define __GCC_IEC_559 2 #define FLT32X_DECIMAL_DIG 17 #define FLT_EVAL_METHOD 0 #define FLT64_DECIMAL_DIG 17 #define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2 #define UINT_FAST32_TYPE unsigned int #define UINT_FAST64_MAX 0xffffffffffffffffULL #define DBL_MIN_10_EXP (-307) #define FINITE_MATH_ONLY 0 #define FLT32X_MAX_EXP 1024 #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1 #define GNUC_PATCHLEVEL 0 #define FLT32_HAS_DENORM 1 #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1 #define UINT_FAST8_MAX 0xff #define __INT8_C(c) c #define __ARM_64BIT_STATE 1 #define INT_LEAST8_WIDTH 8 #define INTMAX_TYPE long int #define UINT_LEAST64_MAX 0xffffffffffffffffULL #define SHRT_MAX 0x7fff #define LDBL_MAX 1.7976931348623157e+308L #define __ARM_FEATURE_IDIV 1 #define LDBL_IS_IEC_60559 2 #define __ARM_FP 14 #define DYNAMIC 1 #define UINT_LEAST8_MAX 0xff #define APPLE_CC 1 #define UINTMAX_TYPE long unsigned int #define FLT_EVAL_METHOD_TS_18661_3 0 #define UINT32_MAX 0xffffffffU #define DBL_DENORM_MIN ((double)4.9406564584124654e-324L) #define AARCH64_CMODEL_SMALL 1 #define LDBL_MAX_EXP 1024 #define CHAR_BIT 8 #define FLT32X_IS_IEC_60559 2 #define INT_LEAST16_WIDTH 16 #define __ARM_ALIGN_MAX_STACK_PWR 16 #define SCHAR_MAX 0x7f #define DBL_MAX ((double)1.7976931348623157e+308L) #define WCHAR_MIN (-WCHAR_MAX - 1) #define __INT64_C(c) c ## LL #define __GCC_ATOMIC_POINTER_LOCK_FREE 2 #define SIZEOF_INT 4 #define INT_FAST64_WIDTH 64 #define __PRAGMA_REDEFINE_EXTNAME 1 #define FLT32X_MANT_DIG 53 #define USER_LABEL_PREFIX _ #define FLT32_MAX_10_EXP 38 #define STDC_HOSTED 1 #define DBL_DIG 15 #define FLT32_DIG 6 #define FLT_EPSILON 1.1920928955078125e-7F #define SHRT_WIDTH 16 #define FLT32_IS_IEC_60559 2 #define LDBL_MIN 2.2250738585072014e-308L #define WINT_TYPE int #define FLT16_HAS_QUIET_NAN 1 #define __strong #define __ARM_SIZEOF_MINIMAL_ENUM 4 #define __FP_FAST_FMA 1 #define FLT32X_HAS_INFINITY 1 #define INT32_MAX 0x7fffffff #define INT_WIDTH 32 #define SIZEOF_LONG 8 #define APPLE 1 #define __UINT16_C(c) c #define DECIMAL_DIG 17 #define FLT64_EPSILON 2.2204460492503131e-16F64 #define INT16_MAX 0x7fff #define LDBL_HAS_QUIET_NAN 1 #define FLT16_MIN_EXP (-13) #define FLT64_MANT_DIG 53 #define LDBL_MANT_DIG 53 #define GNUC 11 #define FLT_HAS_DENORM 1 #define SIZEOF_LONG_DOUBLE 8 #define LDBL_MIN_EXP (-1021) #define FLT64_MAX_10_EXP 308 #define FLT16_MAX_10_EXP 4 #define DBL_IS_IEC_60559 2 #define FLT32_HAS_INFINITY 1 #define LDBL_HAS_DENORM 1 #define DBL_HAS_INFINITY 1 #define __HAVE_SPECULATION_SAFE_VALUE 1 #define INTPTR_WIDTH 64 #define FLT32X_HAS_DENORM 1 #define INT_FAST16_TYPE short int #define STRICT_ANSI 1 #define FLT32_DECIMAL_DIG 9 #define INT_LEAST32_MAX 0x7fffffff #define __weak #define DBL_MAX_EXP 1024 #define WCHAR_WIDTH 32 #define FLT32_MAX 3.4028234663852886e+38F32 #define __GCC_ATOMIC_LONG_LOCK_FREE 2 #define FLT16_DECIMAL_DIG 5 #define FLT_IS_IEC_60559 2 #define FLT32_HAS_QUIET_NAN 1 #define LONG_LONG_MAX 0x7fffffffffffffffLL #define SIZEOF_SIZE_T 8 #define SIG_ATOMIC_WIDTH 32 #define __ARM_ALIGN_MAX_PWR 28 #define SIZEOF_WINT_T 4 #define LONG_LONG_WIDTH 64 #define FLT32_MAX_EXP 128 #define __ARM_FP16_FORMAT_IEEE 1 #define FLT_MIN_EXP (-125) #define FLT64_NORM_MAX 1.7976931348623157e+308F64 #define FLT32X_MIN_EXP (-1021) #define INT_FAST64_TYPE long long int #define __ARM_FP16_ARGS 1 #define __FP_FAST_FMAF 1 #define __FP_FAST_FMAL 1 #define FLT64_DENORM_MIN 4.9406564584124654e-324F64 #define DBL_MIN ((double)2.2250738585072014e-308L) #define __ARM_FEATURE_CLZ 1 #define FLT16_DENORM_MIN 5.9604644775390625e-8F16 #define SIZEOF_POINTER 8 #define __GXX_ABI_VERSION 1015 #define SIZE_TYPE long unsigned int #define LP64 1 #define DBL_HAS_QUIET_NAN 1 #define FLT_EVAL_METHOD_C99 0 #define FLT32X_EPSILON 2.2204460492503131e-16F32x #define FLT64_MIN_EXP (-1021) #define UINT64_MAX 0xffffffffffffffffULL #define LDBL_DECIMAL_DIG 17 #define FLT_MAX 3.4028234663852886e+38F #define aarch64 1 #define FLT64_MIN_10_EXP (-307) #define REGISTER_PREFIX #define UINT16_MAX 0xffff #define LDBL_HAS_INFINITY 1 #define FLT_DIG 6 #define DEC_EVAL_METHOD 2 #define FLT_MANT_DIG 24 #define FLT16_MIN_10_EXP (-4) #define VERSION "11.0.0 20201128 (experimental)" #define __UINT64_C(c) c ## ULL #define WINT_MAX 0x7fffffff #define __GCC_ATOMIC_INT_LOCK_FREE 2 #define FLT32X_MIN 2.2250738585072014e-308F32x #define FLT32_MANT_DIG 24 #define AARCH64EL 1 #define FLOAT_WORD_ORDER ORDER_LITTLE_ENDIAN #define FLT16_MAX_EXP 16 #define BIGGEST_ALIGNMENT 16 #define __INT32_C(c) c #define FLT16_DIG 3 #define SCHAR_WIDTH 8 #define ORDER_PDP_ENDIAN 3412 #define INT_FAST32_TYPE int #define UINT_LEAST16_TYPE short unsigned int #define ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED 110000 #define __ARM_FEATURE_FMA 1 #define INT8_TYPE signed char #define SIG_ATOMIC_TYPE int #define GCC_ASM_FLAG_OUTPUTS 1 #define arm64 1 #define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1 #define FLT_RADIX 2 #define INT_LEAST16_TYPE short int #define __ARM_ARCH_PROFILE 65 #define LDBL_EPSILON 2.2204460492503131e-16L #define __UINTMAX_C(c) c ## UL #define __ARM_PCS_AAPCS64 1 #define SIG_ATOMIC_MAX 0x7fffffff #define OPTIMIZE 1 #define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2 #define SIZEOF_PTRDIFF_T 8 #define __arm64 1 #define __ATOMIC_RELAXED 0 #define INT_FAST32_WIDTH 32 #define LDBL_DIG 15 #define FLT64_IS_IEC_60559 2 #define FLT16_IS_IEC_60559 2 #define FLT64_DIG 15 #define UINT_FAST32_MAX 0xffffffffU #define UINT_LEAST64_TYPE long long unsigned int #define FLT16_EPSILON 9.7656250000000000e-4F16 #define FLT_HAS_QUIET_NAN 1 #define FLT_MAX_10_EXP 38 #define LONG_MAX 0x7fffffffffffffffL #define FLT_HAS_INFINITY 1 #define DBL_HAS_DENORM 1 #define UINT_FAST16_TYPE short unsigned int #define FLT32X_HAS_QUIET_NAN 1 #define CHAR16_TYPE short unsigned int #define SIZE_WIDTH 64 #define INTMAX_WIDTH 64 #define INT_LEAST16_MAX 0x7fff #define FLT16_NORM_MAX 6.5504000000000000e+4F16 #define INT64_MAX 0x7fffffffffffffffLL #define FLT32_DENORM_MIN 1.4012984643248171e-45F32 #define INT_LEAST64_TYPE long long int #define INT16_TYPE short int #define INT_LEAST8_TYPE signed char #define FLT16_MAX 6.5504000000000000e+4F16 #define STDC_VERSION 199901L #define INT_FAST8_MAX 0x7f #define __ARM_ARCH 8 #define INTPTR_MAX 0x7fffffffffffffffL #define __ARM_FEATURE_UNALIGNED 1 #define FLT64_HAS_QUIET_NAN 1 #define FLT32X_DIG 15 #define UINT8_TYPE unsigned char #define PTRDIFF_WIDTH 64 #define CONSTANT_CFSTRINGS 1 #define FLT64_HAS_INFINITY 1 #define FLT16_HAS_INFINITY 1 #define SIG_ATOMIC_MIN (-SIG_ATOMIC_MAX - 1) #define PTRDIFF_MAX 0x7fffffffffffffffL #define FLT16_MANT_DIG 11 #define INTPTR_TYPE long int #define UINT16_TYPE short unsigned int #define WCHAR_TYPE int #define pic 2 #define UINTPTR_MAX 0xffffffffffffffffUL #define __ARM_ARCH_8A 1 #define INT_FAST64_MAX 0x7fffffffffffffffLL #define FLT_NORM_MAX 3.4028234663852886e+38F #define UINT_FAST64_TYPE long long unsigned int #define INT_MAX 0x7fffffff #define INT64_TYPE long long int #define FLT_MAX_EXP 128 #define ORDER_BIG_ENDIAN 4321 #define DBL_MANT_DIG 53 #define INT_LEAST64_MAX 0x7fffffffffffffffLL #define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2 #define __FP_FAST_FMAF32 1 #define UINT_LEAST32_TYPE unsigned int #define SIZEOF_SHORT 2 #define FLT32_NORM_MAX 3.4028234663852886e+38F32 #define __GCC_ATOMIC_BOOL_LOCK_FREE 2 #define FLT64_MAX 1.7976931348623157e+308F64 #define MACH 1 #define LITTLE_ENDIAN 1 #define WINT_WIDTH 32 #define __FP_FAST_FMAF64 1 #define INT_LEAST8_MAX 0x7f #define INT_LEAST64_WIDTH 64 #define FLT32X_MAX_10_EXP 308 #define INT_FAST16_MAX 0x7fff #define SIZEOF_INT128 16 #define FLT16_MIN 6.1035156250000000e-5F16 #define LDBL_MAX_10_EXP 308 #define DBL_EPSILON ((double)2.2204460492503131e-16L) #define FLT32_MIN_EXP (-125) #define _LP64 1 #define __UINT8_C(c) c #define FLT64_MAX_EXP 1024 #define INT_LEAST32_TYPE int #define UINT64_TYPE long long unsigned int #define __ARM_NEON 1 #define INT_FAST32_MAX 0x7fffffff #define INTMAX_MAX 0x7fffffffffffffffL #define UINT_FAST8_TYPE unsigned char #define INT_FAST8_TYPE signed char #define GNUC_STDC_INLINE 1 #define FLT64_HAS_DENORM 1 #define _OPENMP 201511 #define FLT32_EPSILON 1.1920928955078125e-7F32 #define __FP_FAST_FMAF32x 1 #define FLT16_HAS_DENORM 1 #define INT_FAST8_WIDTH 8 #define FLT32X_MAX 1.7976931348623157e+308F32x #define DBL_NORM_MAX ((double)1.7976931348623157e+308L) #define BYTE_ORDER ORDER_LITTLE_ENDIAN #define LDBL_DENORM_MIN 4.9406564584124654e-324L #define SIZEOF_WCHAR_T 4 #define __UINT32_C(c) c ## U #define FLT_DENORM_MIN 1.4012984643248171e-45F #define WINT_MIN (-WINT_MAX - 1) #define INT8_MAX 0x7f #define LONG_WIDTH 64 #define PIC 2 #define FLT32X_NORM_MAX 1.7976931348623157e+308F32x #define CHAR32_TYPE unsigned int #define FLT32_MIN_10_EXP (-37) #define __ARM_FEATURE_NUMERIC_MAXMIN 1 #define INT32_TYPE int #define SIZEOF_DOUBLE 8 #define FLT_MIN_10_EXP (-37) #define FLT64_MIN 2.2250738585072014e-308F64 #define INT_LEAST32_WIDTH 32 #define SIZEOF_FLOAT 4 #define __ATOMIC_CONSUME 1 #define GNUC_MINOR 0 #define INT_FAST16_WIDTH 16 #define UINTMAX_MAX 0xffffffffffffffffUL #define FLT32X_DENORM_MIN 4.9406564584124654e-324F32x #define DBL_MAX_10_EXP 308 #define __INT16_C(c) c #define __ARM_ARCH_ISA_A64 1 #define STDC 1 #define PTRDIFF_TYPE long int #define FLT32_MIN 1.1754943508222875e-38F32 #define __ATOMIC_SEQ_CST 5 #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 1 #define UINT32_TYPE unsigned int #define FLT32X_MIN_10_EXP (-307) #define UINTPTR_TYPE long unsigned int #define LDBL_MIN_10_EXP (-307) #define SIZEOF_LONG_LONG 8 #define __GCC_ATOMIC_LLONG_LOCK_FREE 2 #define FLT_DECIMAL_DIG 9 #define UINT_FAST16_MAX 0xffff #define LDBL_NORM_MAX 1.7976931348623157e+308L #define __GCC_ATOMIC_SHORT_LOCK_FREE 2 #define ORDER_LITTLE_ENDIAN 1234 #define SIZE_MAX 0xffffffffffffffffUL #define UINT_LEAST32_MAX 0xffffffffU #define __ATOMIC_ACQ_REL 4 #define __ATOMIC_RELEASE 3 *** /usr/local/bin/gcc -fopenmp done ***

On 16 Jan 2021, at 9:32 am, Manodeep Sinha [email protected] wrote:

@karlglazebrook https://github.com/karlglazebrook Do you mind copy-pasting the output of:

#!/bin/bash
declare -a compilers=("/usr/bin/clang" "/usr/local/bin/gcc -fopenmp") for cc in "${compilers[@]}" do echo "*** $cc " $cc -std=c99 -march=native -O3 -dM -E - < /dev/null echo " $cc done ***" done This will give a hint as to what compiler flags are being defined for the OS + instruction set.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/manodeep/Corrfunc/issues/241#issuecomment-761232174, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADU7FGW2FPXPM5QZ4GEOYFTS2C7BPANCNFSM4WDRL27Q.

Jan 16 '21 08:01 karlglazebrook

This is great - thanks @karlglazebrook!

Looks like __aarch64__, __arm64, and __arm64__ are all defined (equal to 1) in these three compiler flags. For the ISA, looks like __ARM_NEON (=1) is defined for gcc (with or without -march=native) while __ARM_NEON (=1), __ARM_NEON__ (=1), __ARM_NEON_FP (=0xE)

So the platform can be detected with any of __aarch64__, __aarch64__,__aarch64__ (or to be on the safe side, an || between all three) and then returning FALLBACK ISA before running the cpuid call.

If we add any NEON kernels in the future, then those will have to be protected by the #ifdef __ARM_NEON conditions, and the corresponding cpuid check will have to updated to the actual assembly call necessary. (The compile time check is necessary but the runtime cpu may be different)

@lgarrison What do you think?

Jan 17 '21 02:01 manodeep

I will also note that there might be "undocumented" vectorised calls - someone dug these instructions out. Here's my fork of their secret gist. Found the gist through here

Jan 17 '21 02:01 manodeep

That's if any of __aarch64__, __arm64, and __arm64__ are detected? Sounds good to me!

Would be a fun project to try to get those undocumented vector calls to work!

Jan 17 '21 20:01 lgarrison

Here are the (untested) updates to the cpu_features.[ch] files.

cpu_features.h


/* File: cpu_features.h */
/*
  This file is a part of the Corrfunc package
  Copyright (C) 2015-- Manodeep Sinha ([email protected])
  License: MIT LICENSE. See LICENSE file under the top-level
  directory at https://github.com/manodeep/Corrfunc/


  Adapted from Agner Fog's vectorclass: http://agner.org/
*/

#pragma once
#include <stdint.h>
#include <stdbool.h>

#ifdef __cplusplus
 extern "C" {
#endif

typedef enum {
  DEFAULT=-42,/* present simply to make the enum a signed int*/
  FALLBACK=0, /* No special options */
  SSE=1,  /* 64 bit vectors */
  SSE2=2, /* 128 bit vectors */
  SSE3=3, /* 128 bit vectors */
  SSSE3=4, /* 128 bit vectors */
  SSE4=5,/* 128bit vectors */
  SSE42=6, /* 128bit vectors with blend operations */
  AVX=7, /* 256bit vector width */
  AVX2=8,  /* AVX2 (integer operations)*/
  AVX512F=9,/* AVX 512 Foundation */
  NUM_ISA  /*NUM_ISA will be the next integer after
            the last declared enum. AVX512F:=9 (so, NUM_ISA==10)*/
} isa;  //name for instruction sets -> corresponds to the return values for functions in cpu_features.c


static inline void cpuid (int output[4], int functionnumber) 
{	

#if defined(__aarch64__) || defined(__arm64__) || defined(__arm64) || defined(__aarch32__)
    /* Assuming ARM64( and hopefully also ARM32) */
    return;
#else
    /* Assuming x86_64 arch */
#if defined(__GNUC__) || defined(__clang__)              // use inline assembly, Gnu/AT&T syntax

   int a, b, c, d;
   __asm("cpuid" : "=a"(a),"=b"(b),"=c"(c),"=d"(d) : "a"(functionnumber),"c"(0) );
   output[0] = a;
   output[1] = b;
   output[2] = c;
   output[3] = d;

#else                                                      // unknown platform. try inline assembly with masm/intel syntax

    __asm {
        mov eax, functionnumber
        xor ecx, ecx
        cpuid;
        mov esi, output
        mov [esi],    eax
        mov [esi+4],  ebx
        mov [esi+8],  ecx
        mov [esi+12], edx
    }
#endif
#endif /* end of x86_64 arch */
}

// Define interface to xgetbv instruction
static inline int64_t xgetbv (int ctr) 
{
#if defined(__aarch64__) || defined(__arm64__) || defined(__arm64) || defined(__aarch32__)
   /* Assuming ARM64 (and hopefully also ARM32) */
    return 0;
#else	
    /* Assuming x86_64 */
#if (defined (__INTEL_COMPILER) && __INTEL_COMPILER >= 1200) //Intel compiler supporting _xgetbv intrinsic
    return _xgetbv(ctr);                                   // intrinsic function for XGETBV
#elif defined(__GNUC__)                                    // use inline assembly, Gnu/AT&T syntax
   uint32_t a, d;
   __asm("xgetbv" : "=a"(a),"=d"(d) : "c"(ctr) : );
   return a | (((uint64_t) d) << 32);
#else  
   uint32_t a, d;
    __asm {
        mov ecx, ctr
        _emit 0x0f
        _emit 0x01
        _emit 0xd0 ; // xgetbv
        mov a, eax
        mov d, edx
    }
    return a | (((uint64_t) d) << 32);
#endif
#endif /* end of x86_64 arch */
}

extern int runtime_instrset_detect(void);
extern int get_max_usable_isa(void);

#ifdef __cplusplus
}
#endif

cpu_features.c

/* File: cpu_features.c */
/*
  This file is a part of the Corrfunc package
  Copyright (C) 2015-- Manodeep Sinha ([email protected])
  License: MIT LICENSE. See LICENSE file under the top-level
  directory at https://github.com/manodeep/Corrfunc/

  Adapted from Agner Fog's vectorclass: http://agner.org/
*/

#include <stdio.h>

#include "cpu_features.h"

// Use CPUID to detect what instruction sets the CPU supports
// The compiler may not support all these features though!
// Use get_max_usable_isa() to find the max ISA supported
// by both the compiler and CPU
int runtime_instrset_detect(void)
{
    static int iset = -1;                                  // remember value for next call
    if (iset >= 0) {
        return iset;                                       // called before
    }
    iset = FALLBACK;                                       // default value

#if defined(__aarch64__) || defined(__arm64__) || defined(__arm64) || defined(__aarch32__)
    /* assuming ARM (aarch64, and hopefully aarch32) */
    return iset; /* should always be FALLBACK*/
#else
    /* Assuming x86_64 architecture */
    int abcd[4] = {0,0,0,0};                               // cpuid results
    cpuid(abcd, 0);                                        // call cpuid function 0
    if (abcd[0] == 0) return iset;                         // no further cpuid function supported
    cpuid(abcd, 1);                                        // call cpuid function 1 for feature flags
    if ((abcd[3] & (1 <<  0)) == 0) return iset;           // no floating point
    if ((abcd[3] & (1 << 23)) == 0) return iset;           // no MMX
    if ((abcd[3] & (1 << 15)) == 0) return iset;           // no conditional move
    if ((abcd[3] & (1 << 24)) == 0) return iset;           // no FXSAVE
    if ((abcd[3] & (1 << 25)) == 0) return iset;           // no SSE
    iset = SSE;                                            // 1: SSE supported

    if ((abcd[3] & (1 << 26)) == 0) return iset;           // no SSE2
    iset = SSE2;                                           // 2: SSE2 supported

    if ((abcd[2] & (1 <<  0)) == 0) return iset;           // no SSE3
    iset = SSE3;                                           // 3: SSE3 supported

    if ((abcd[2] & (1 <<  9)) == 0) return iset;           // no SSSE3
    iset = SSSE3;                                          // 4: SSSE3 supported

    if ((abcd[2] & (1 << 19)) == 0) return iset;           // no SSE4.1
    iset = SSE4;                                           // 5: SSE4.1 supported

    if ((abcd[2] & (1 << 23)) == 0) return iset;           // no POPCNT
    if ((abcd[2] & (1 << 20)) == 0) return iset;           // no SSE4.2
    iset = SSE42;                                          // 6: SSE4.2 supported

    if ((abcd[2] & (1 << 27)) == 0) return iset;           // no OSXSAVE
    if ((xgetbv(0) & 6) != 6)       return iset;           // AVX not enabled in O.S.
    if ((abcd[2] & (1 << 28)) == 0) return iset;           // no AVX
    iset = AVX;                                            // 7: AVX supported

    cpuid(abcd, 7);                                        // call cpuid leaf 7 for feature flags
    if ((abcd[1] & (1 <<  5)) == 0) return iset;           // no AVX2
    iset = AVX2;                                           // 8: AVX2 supported

    cpuid(abcd, 0xD);                                      // call cpuid leaf 0xD for feature flags
    if ((abcd[0] & 0x60) != 0x60)   return iset;           // no AVX512
    iset = AVX512F;                                        // 9: AVX512F supported
    return iset;
#endif /* end of x86_64 architecture specific code*/
}

// Report the max ISA supported by both the CPU and compiler
int get_max_usable_isa(void)
{
    static int iset = -1;                                  // remember value for next call
    if (iset >= 0) {
        return iset;                                       // called before
    }

#if defined(__aarch64__) || defined(__arm64__) || defined(__arm64) || defined(__aarch32__)
    iset = FALLBACK;
    return iset;
#endif

    iset = runtime_instrset_detect();

    switch(iset){
        case AVX512F:
#ifdef __AVX512F__
            iset = AVX512F;
            break;
#elif defined(GAS_BUG_DISABLE_AVX512)
            fprintf(stderr, "[Warning] AVX512F is disabled due to a GNU Assembler bug.  Upgrade to binutils >= 2.32 to fix this.\n");
#else
            fprintf(stderr, "[Warning] The CPU supports AVX512F but the compiler does not.  Can you try another compiler?\n");
#endif
            // fall through
        case AVX2:
#ifdef __AVX2__
            iset = AVX2;
            break;
#else
            fprintf(stderr, "[Warning] The CPU supports AVX2 but the compiler does not.  Can you try another compiler?\n");
#endif
            // fall through
        case AVX:
#ifdef __AVX__
            iset = AVX;
            break;
#else
            fprintf(stderr, "[Warning] The CPU supports AVX but the compiler does not.  Can you try another compiler?\n");
#endif
            // fall through
        case SSE42:
#ifdef __SSE4_2__
            iset = SSE42;
            break;
#else
            fprintf(stderr, "[Warning] The CPU supports SSE4.2 but the compiler does not.  Can you try another compiler?\n");
#endif
            // fall through
        case SSE4:
#ifdef __SSE4_1__
            iset = SSE4;
            break;
#else
            fprintf(stderr, "[Warning] The CPU supports SSE4.1 but the compiler does not.  Can you try another compiler?\n");
#endif
            // fall through
        case SSSE3:
#ifdef __SSSE3__
            iset = SSSE3;
            break;
#else
            fprintf(stderr, "[Warning] The CPU supports SSSE3 but the compiler does not.  Can you try another compiler?\n");
#endif
            // fall through
        case SSE3:
#ifdef __SSE3__
            iset = SSE3;
            break;
#else
            fprintf(stderr, "[Warning] The CPU supports SSE3 but the compiler does not.  Can you try another compiler?\n");
#endif
            // fall through
        case SSE2:
#ifdef __SSE2__
            iset = SSE2;
            break;
#else
            fprintf(stderr, "[Warning] The CPU supports SSE2 but the compiler does not.  Can you try another compiler?\n");
#endif
            // fall through
        case SSE:
#ifdef __SSE__
            iset = SSE;
            break;
#else
            fprintf(stderr, "[Warning] The CPU supports SSE but the compiler does not.  Can you try another compiler?\n");
#endif
            // fall through
        case FALLBACK:
        default:
            iset = FALLBACK;
            break;
    }

    return iset;
}

@lgarrison Will you please see if the updates and returns make sense?

Jan 21 '21 03:01 manodeep

The updates from the previous comment do help to get the code built and installed. I wonder if a bit more systematic fix will be possible?

Nov 17 '22 05:11 misharash

@misharash You might be interested in the initial implementation for the M1 architecture within the arm64 branch on this repo

Nov 17 '22 11:11 manodeep

Right, the arm64 branch (PR here) works on the M1 with both fallback and NEON kernels. So you can just use that branch instead of patching your source with the previous code.

@manodeep can confirm, but the fallback kernel from that branch should definitely be safe to use. It sounds like the NEON kernel is working too, but is less tested.

Nov 17 '22 14:11 lgarrison

Thank you! Initially it wasn't clear that the NEON pull request was related to Apple Silicon support.

Nov 17 '22 17:11 misharash

Solved by #295. Now Corrfunc master branch should compile and run fine on Apple laptops with M1/M2 cpus

Sidenote: Optimised kernels being (slowly) implemented under the arm64 branch.

Jul 17 '23 11:07 manodeep

Corrfunc Corrfunc copied to clipboard

Compiling on Apple M1

Corrfunc
Corrfunc copied to clipboard