whisper.cpp
whisper.cpp copied to clipboard
Make failure - error: inlining failed in call to always_inline
Thanks for releasing this software, I'm trying to build this in a Debian system but make is failing. The Compiler versions are
❯ cc --version
cc (Debian 8.3.0-6) 8.3.0
❯ c++ --version
c++ (Debian 8.3.0-6) 8.3.0
When trying make base.en
in the master
branch the following error happens
❯ make base.en
cc -I. -O3 -std=c11 -pthread -mavx -c ggml.c -o ggml.o
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
ggml.c: In function ‘ggml_vec_mad_f16’:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:73:1: error: inlining failed in call to always_inline ‘_mm256_cvtps_ph’: target specific option mismatch
_mm256_cvtps_ph (__m256 __A, const int __I)
^~~~~~~~~~~~~~~
ggml.c:1024:9: note: called from here
_mm_storeu_si128((__m128i*)(y + i + 24), _mm256_cvtps_ph(y3, 0));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:73:1: error: inlining failed in call to always_inline ‘_mm256_cvtps_ph’: target specific option mismatch
_mm256_cvtps_ph (__m256 __A, const int __I)
^~~~~~~~~~~~~~~
ggml.c:1023:9: note: called from here
_mm_storeu_si128((__m128i*)(y + i + 16), _mm256_cvtps_ph(y2, 0));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:73:1: error: inlining failed in call to always_inline ‘_mm256_cvtps_ph’: target specific option mismatch
_mm256_cvtps_ph (__m256 __A, const int __I)
^~~~~~~~~~~~~~~
ggml.c:1022:9: note: called from here
_mm_storeu_si128((__m128i*)(y + i + 8 ), _mm256_cvtps_ph(y1, 0));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:73:1: error: inlining failed in call to always_inline ‘_mm256_cvtps_ph’: target specific option mismatch
_mm256_cvtps_ph (__m256 __A, const int __I)
^~~~~~~~~~~~~~~
ggml.c:1021:9: note: called from here
_mm_storeu_si128((__m128i*)(y + i + 0 ), _mm256_cvtps_ph(y0, 0));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:52:1: error: inlining failed in call to always_inline ‘_mm256_cvtph_ps’: target specific option mismatch
_mm256_cvtph_ps (__m128i __A)
^~~~~~~~~~~~~~~
ggml.c:1014:14: note: called from here
x3 = _mm256_cvtph_ps(_mm_loadu_si128((__m128i*)(x + i + 24)));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:52:1: error: inlining failed in call to always_inline ‘_mm256_cvtph_ps’: target specific option mismatch
_mm256_cvtph_ps (__m128i __A)
^~~~~~~~~~~~~~~
ggml.c:1013:14: note: called from here
x2 = _mm256_cvtph_ps(_mm_loadu_si128((__m128i*)(x + i + 16)));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:52:1: error: inlining failed in call to always_inline ‘_mm256_cvtph_ps’: target specific option mismatch
_mm256_cvtph_ps (__m128i __A)
^~~~~~~~~~~~~~~
ggml.c:1012:14: note: called from here
x1 = _mm256_cvtph_ps(_mm_loadu_si128((__m128i*)(x + i + 8 )));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:52:1: error: inlining failed in call to always_inline ‘_mm256_cvtph_ps’: target specific option mismatch
_mm256_cvtph_ps (__m128i __A)
^~~~~~~~~~~~~~~
ggml.c:1011:14: note: called from here
x0 = _mm256_cvtph_ps(_mm_loadu_si128((__m128i*)(x + i + 0 )));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:52:1: error: inlining failed in call to always_inline ‘_mm256_cvtph_ps’: target specific option mismatch
_mm256_cvtph_ps (__m128i __A)
^~~~~~~~~~~~~~~
ggml.c:1009:14: note: called from here
y3 = _mm256_cvtph_ps(_mm_loadu_si128((__m128i*)(y + i + 24)));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:52:1: error: inlining failed in call to always_inline ‘_mm256_cvtph_ps’: target specific option mismatch
_mm256_cvtph_ps (__m128i __A)
^~~~~~~~~~~~~~~
ggml.c:1008:14: note: called from here
y2 = _mm256_cvtph_ps(_mm_loadu_si128((__m128i*)(y + i + 16)));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:52:1: error: inlining failed in call to always_inline ‘_mm256_cvtph_ps’: target specific option mismatch
_mm256_cvtph_ps (__m128i __A)
^~~~~~~~~~~~~~~
ggml.c:1007:14: note: called from here
y1 = _mm256_cvtph_ps(_mm_loadu_si128((__m128i*)(y + i + 8 )));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:52:1: error: inlining failed in call to always_inline ‘_mm256_cvtph_ps’: target specific option mismatch
_mm256_cvtph_ps (__m128i __A)
^~~~~~~~~~~~~~~
ggml.c:1006:14: note: called from here
y0 = _mm256_cvtph_ps(_mm_loadu_si128((__m128i*)(y + i + 0 )));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:52:1: error: inlining failed in call to always_inline ‘_mm256_cvtph_ps’: target specific option mismatch
_mm256_cvtph_ps (__m128i __A)
^~~~~~~~~~~~~~~
ggml.c:1006:14: note: called from here
y0 = _mm256_cvtph_ps(_mm_loadu_si128((__m128i*)(y + i + 0 )));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:52:1: error: inlining failed in call to always_inline ‘_mm256_cvtph_ps’: target specific option mismatch
_mm256_cvtph_ps (__m128i __A)
^~~~~~~~~~~~~~~
ggml.c:1007:14: note: called from here
y1 = _mm256_cvtph_ps(_mm_loadu_si128((__m128i*)(y + i + 8 )));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:52:1: error: inlining failed in call to always_inline ‘_mm256_cvtph_ps’: target specific option mismatch
_mm256_cvtph_ps (__m128i __A)
^~~~~~~~~~~~~~~
ggml.c:1008:14: note: called from here
y2 = _mm256_cvtph_ps(_mm_loadu_si128((__m128i*)(y + i + 16)));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:52:1: error: inlining failed in call to always_inline ‘_mm256_cvtph_ps’: target specific option mismatch
_mm256_cvtph_ps (__m128i __A)
^~~~~~~~~~~~~~~
ggml.c:1009:14: note: called from here
y3 = _mm256_cvtph_ps(_mm_loadu_si128((__m128i*)(y + i + 24)));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:52:1: error: inlining failed in call to always_inline ‘_mm256_cvtph_ps’: target specific option mismatch
_mm256_cvtph_ps (__m128i __A)
^~~~~~~~~~~~~~~
ggml.c:1011:14: note: called from here
x0 = _mm256_cvtph_ps(_mm_loadu_si128((__m128i*)(x + i + 0 )));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:52:1: error: inlining failed in call to always_inline ‘_mm256_cvtph_ps’: target specific option mismatch
_mm256_cvtph_ps (__m128i __A)
^~~~~~~~~~~~~~~
ggml.c:1012:14: note: called from here
x1 = _mm256_cvtph_ps(_mm_loadu_si128((__m128i*)(x + i + 8 )));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:52:1: error: inlining failed in call to always_inline ‘_mm256_cvtph_ps’: target specific option mismatch
_mm256_cvtph_ps (__m128i __A)
^~~~~~~~~~~~~~~
ggml.c:1013:14: note: called from here
x2 = _mm256_cvtph_ps(_mm_loadu_si128((__m128i*)(x + i + 16)));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:52:1: error: inlining failed in call to always_inline ‘_mm256_cvtph_ps’: target specific option mismatch
_mm256_cvtph_ps (__m128i __A)
^~~~~~~~~~~~~~~
ggml.c:1014:14: note: called from here
x3 = _mm256_cvtph_ps(_mm_loadu_si128((__m128i*)(x + i + 24)));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:73:1: error: inlining failed in call to always_inline ‘_mm256_cvtps_ph’: target specific option mismatch
_mm256_cvtps_ph (__m256 __A, const int __I)
^~~~~~~~~~~~~~~
ggml.c:1021:9: note: called from here
_mm_storeu_si128((__m128i*)(y + i + 0 ), _mm256_cvtps_ph(y0, 0));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:73:1: error: inlining failed in call to always_inline ‘_mm256_cvtps_ph’: target specific option mismatch
_mm256_cvtps_ph (__m256 __A, const int __I)
^~~~~~~~~~~~~~~
ggml.c:1022:9: note: called from here
_mm_storeu_si128((__m128i*)(y + i + 8 ), _mm256_cvtps_ph(y1, 0));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:73:1: error: inlining failed in call to always_inline ‘_mm256_cvtps_ph’: target specific option mismatch
_mm256_cvtps_ph (__m256 __A, const int __I)
^~~~~~~~~~~~~~~
ggml.c:1023:9: note: called from here
_mm_storeu_si128((__m128i*)(y + i + 16), _mm256_cvtps_ph(y2, 0));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/include/immintrin.h:99,
from ggml.c:128:
/usr/lib/gcc/x86_64-linux-gnu/8/include/f16cintrin.h:73:1: error: inlining failed in call to always_inline ‘_mm256_cvtps_ph’: target specific option mismatch
_mm256_cvtps_ph (__m256 __A, const int __I)
^~~~~~~~~~~~~~~
ggml.c:1024:9: note: called from here
_mm_storeu_si128((__m128i*)(y + i + 24), _mm256_cvtps_ph(y3, 0));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
make: *** [Makefile:127: ggml.o] Error 1
I commented out the avx lines in the makefile, like this:
diff --git a/Makefile b/Makefile
index 07c91c1..b89d835 100644
--- a/Makefile
+++ b/Makefile
@@ -62,10 +62,10 @@ ifeq ($(UNAME_M),x86_64)
endif
endif
ifeq ($(UNAME_S),Linux)
- AVX1_M := $(shell grep "avx " /proc/cpuinfo)
- ifneq (,$(findstring avx,$(AVX1_M)))
- CFLAGS += -mavx
- endif
+ #AVX1_M := $(shell grep "avx " /proc/cpuinfo)
+ #ifneq (,$(findstring avx,$(AVX1_M)))
+ # CFLAGS += -mavx
+ #endif
AVX2_M := $(shell grep "avx2 " /proc/cpuinfo)
ifneq (,$(findstring avx2,$(AVX2_M)))
CFLAGS += -mavx2
and that made it compile under Debian for me.
Thanks @sachac , that did the trick. Makefile will need some extra logic to be patched I assume
I also get this error on my Debian 11 server, with Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz ( no AVX2 only AVX )
Above patch to makefile allows compilation without AVX, however im struggling to compile with AVX support. happy to help where I can.
I am also seeing this on Fedora 36. The Makefile patch mentioned above worked for me.
@jaybinks Can you post the output of cat /proc/cpuinfo
?
@ggerganov cpu info as requested.
My CPU is a sandybridge architecture, however it shows avx instructions in the below output.
processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz stepping : 7 microcode : 0x14 cpu MHz : 3800.000 cache size : 8192 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts vmx flags : vnmi preemption_timer invvpid ept_x_only flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown bogomips : 6822.28 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz stepping : 7 microcode : 0x14 cpu MHz : 3800.000 cache size : 8192 KB physical id : 0 siblings : 8 core id : 1 cpu cores : 4 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts vmx flags : vnmi preemption_timer invvpid ept_x_only flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown bogomips : 6822.28 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz stepping : 7 microcode : 0x14 cpu MHz : 3800.000 cache size : 8192 KB physical id : 0 siblings : 8 core id : 2 cpu cores : 4 apicid : 4 initial apicid : 4 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts vmx flags : vnmi preemption_timer invvpid ept_x_only flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown bogomips : 6822.28 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz stepping : 7 microcode : 0x14 cpu MHz : 3286.080 cache size : 8192 KB physical id : 0 siblings : 8 core id : 3 cpu cores : 4 apicid : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts vmx flags : vnmi preemption_timer invvpid ept_x_only flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown bogomips : 6822.28 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
processor : 4 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz stepping : 7 microcode : 0x14 cpu MHz : 2772.126 cache size : 8192 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 4 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts vmx flags : vnmi preemption_timer invvpid ept_x_only flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown bogomips : 6822.28 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
processor : 5 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz stepping : 7 microcode : 0x14 cpu MHz : 2313.449 cache size : 8192 KB physical id : 0 siblings : 8 core id : 1 cpu cores : 4 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts vmx flags : vnmi preemption_timer invvpid ept_x_only flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown bogomips : 6822.28 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
processor : 6 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz stepping : 7 microcode : 0x14 cpu MHz : 3800.000 cache size : 8192 KB physical id : 0 siblings : 8 core id : 2 cpu cores : 4 apicid : 5 initial apicid : 5 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts vmx flags : vnmi preemption_timer invvpid ept_x_only flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown bogomips : 6822.28 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz stepping : 7 microcode : 0x14 cpu MHz : 3800.000 cache size : 8192 KB physical id : 0 siblings : 8 core id : 3 cpu cores : 4 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts vmx flags : vnmi preemption_timer invvpid ept_x_only flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown bogomips : 6822.28 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
gcc -I. -mavx -march=haswell -pthread -c ggml.c -o ggml.o
works as expected
gcc -I. -mavx -march=sandybridge -pthread -c ggml.c -o ggml.o
or
gcc -I. -mavx -march=native -pthread -c ggml.c -o ggml.o
fails as above.
ok this starts to get interesting .... dmidecode does NOT show AVX support !?
I found this thread... even though its about FreeBSD, its exactly what I'm seeing (https://www.dslreports.com/forum/r25939252-AVX-on-i7-2600k-and-ASUS-P8Z68-V-PRO)
dmidecode -tprocessor
# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 2.6 present.
Handle 0x0004, DMI type 4, 42 bytes
Processor Information
Socket Designation: LGA1155
Type: Central Processor
Family: Core 2 Duo
Manufacturer: Intel
ID: A7 06 02 00 FF FB EB BF
Signature: Type 0, Family 6, Model 42, Stepping 7
Flags:
FPU (Floating-point unit on-chip)
VME (Virtual mode extension)
DE (Debugging extension)
PSE (Page size extension)
TSC (Time stamp counter)
MSR (Model specific registers)
PAE (Physical address extension)
MCE (Machine check exception)
CX8 (CMPXCHG8 instruction supported)
APIC (On-chip APIC hardware supported)
SEP (Fast system call)
MTRR (Memory type range registers)
PGE (Page global enable)
MCA (Machine check architecture)
CMOV (Conditional move instruction supported)
PAT (Page attribute table)
PSE-36 (36-bit page size extension)
CLFSH (CLFLUSH instruction supported)
DS (Debug store)
ACPI (ACPI supported)
MMX (MMX technology supported)
FXSR (FXSAVE and FXSTOR instructions supported)
SSE (Streaming SIMD extensions)
SSE2 (Streaming SIMD extensions 2)
SS (Self-snoop)
HTT (Multi-threading)
TM (Thermal monitor supported)
PBE (Pending break enabled)
Version: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
Voltage: 1.0 V
External Clock: 100 MHz
Max Speed: 3800 MHz
Current Speed: 3439 MHz
Status: Populated, Enabled
Upgrade: Other
L1 Cache Handle: 0x0005
L2 Cache Handle: 0x0006
L3 Cache Handle: 0x0007
Serial Number: To Be Filled By O.E.M.
Asset Tag: To Be Filled By O.E.M.
Part Number: To Be Filled By O.E.M.
Core Count: 4
Core Enabled: 1
Thread Count: 2
Characteristics:
64-bit capable
@aryker what CPU do you have ?
Interestingly... using intrin.sh from https://stackoverflow.com/questions/43128698/inlining-failed-in-call-to-always-inline-mm-mullo-epi32-target-specific-opti suggests we want -mf16c not -mavx , does this make sense ?
doesn't work for me on my machine ... it compiles but I get "Illegal instruction" at runtime.
A real head-scratcher, since I should have AVX support.
yea so this is basically the same as https://github.com/ggerganov/whisper.cpp/issues/238
Sandy Bridge CPU ... no F16c support ... ok... ill give up and be happy with my little win from -Ofast
@aryker what CPU do you have ?
Here's my cpuinfo file:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz
stepping : 7
microcode : 0x2f
cpu MHz : 800.000
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
vmx flags : vnmi preemption_timer invvpid ept_x_only flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 4984.27
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz
stepping : 7
microcode : 0x2f
cpu MHz : 959.352
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
vmx flags : vnmi preemption_timer invvpid ept_x_only flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 4984.27
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz
stepping : 7
microcode : 0x2f
cpu MHz : 800.000
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 2
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
vmx flags : vnmi preemption_timer invvpid ept_x_only flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 4984.27
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz
stepping : 7
microcode : 0x2f
cpu MHz : 1507.355
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 2
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
vmx flags : vnmi preemption_timer invvpid ept_x_only flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 4984.27
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
@aryker
This CPU has AVX but does not have F16C. This combination is currently not supported in ggml.c
.
The Makefile
has to be adjusted to not add the -mavx
flag if F16C is not available:
https://github.com/ggerganov/whisper.cpp/blob/4c1fe0c813135fbccc487b85f024e633a911aeba/Makefile#L53-L104
Same problem here on an AMD Opteron(TM) Processor 6220 AMD flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid ex td_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 top oext perfctr_core perfctr_nb cpb hw_pstate ssbd vmmcall arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
AVX supported but F16C is not. Compiled without optimization and is being incredibly slow, will linking to openblas make this better? Thanks
I can report similar issue here. On Xeon(R) CPU E5-2670 running Proxmox 7.4 i also had to disable "-mavx" to get code compiled.
I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -msse3
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread
I LDFLAGS:
I CC: cc (Debian 10.2.1-6) 10.2.1 20210110
I CXX: g++ (Debian 10.2.1-6) 10.2.1 20210110
cpuinfo tells AVX but no F16C:
processor : 31
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
stepping : 7
microcode : 0x718
cpu MHz : 3300.000
cache size : 20480 KB
physical id : 1
siblings : 16
core id : 7
cpu cores : 8
apicid : 47
initial apicid : 47
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
vmx flags : vnmi preemption_timer invvpid ept_x_only ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown
bogomips : 5202.46
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
Doing that everything is slow both on 4 and 32 threads ./bench -m ./models/ggml-medium.bin
system_info: n_threads = 4 / 32 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
whisper_print_timings: load time = 1107.48 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 0.00 ms
whisper_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: encode time = 191019.75 ms / 1 runs (191019.75 ms per run)
whisper_print_timings: decode time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 192274.55 ms
system_info: n_threads = 32 / 32 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
whisper_print_timings: load time = 1100.58 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 0.00 ms
whisper_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: encode time = 54810.91 ms / 1 runs (54810.91 ms per run)
whisper_print_timings: decode time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 56057.46 ms
That makes my i3-7100 looking like a fast machine! ;)
system_info: n_threads = 4 / 4 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
whisper_print_timings: load time = 396.43 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 0.00 ms
whisper_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: encode time = 38768.12 ms / 1 runs (38768.12 ms per run)
whisper_print_timings: decode time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 39215.11 ms
``