serge error: inlining failed in call to 'always_inline' '_mm256_cvtph

Hi!

Sorry to make this an issue, but I'm running into it! I've followed the README and am trying to get it running but I run into quite a few errors. Maybe I'm just missing a dependency or something like that, but I haven't quite figured it out for myself yet and am wondering if others might be running into the same thing? I've tried this on two fairly clean Ubuntu 22.04 machines with the same results.

After the initial docker stuff does its pulls, I run into these lines of output:

Status: Downloaded newer image for gcc:10.2
 ---> 987c8580a041
Step 2/12 : WORKDIR /tmp
 ---> Running in 6eb681888247
Removing intermediate container 6eb681888247
 ---> 0999a4b386ae
Step 3/12 : RUN git clone https://github.com/ggerganov/llama.cpp.git --branch master-d5850c5
 ---> Running in 7f705b3f31b9
Cloning into 'llama.cpp'...
Note: checking out 'd5850c53ca179b9674b98f35d359763416a3cc11'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

Removing intermediate container 7f705b3f31b9
 ---> 1a27e05ce64f
Step 4/12 : RUN cd llama.cpp &&     make &&     mv main llama
 ---> Running in eae1a4f90a3a
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  unknown
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -mavx -msse3
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread
I LDFLAGS:  
I CC:       cc (GCC) 10.2.0
I CXX:      g++ (GCC) 10.2.0

cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -mavx -msse3   -c ggml.c -o ggml.o
In file included from /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:113,
                 from ggml.c:155:
ggml.c: In function 'ggml_vec_dot_f16':
/usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/f16cintrin.h:52:1: error: inlining failed in call to 'always_inline' '_mm256_cvtph_ps': target specific option mismatch
   52 | _mm256_cvtph_ps (__m128i __A)
      | ^~~~~~~~~~~~~~~
ggml.c:915:33: note: called from here
  915 | #define GGML_F32Cx8_LOAD(x)     _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:925:37: note: in expansion of macro 'GGML_F32Cx8_LOAD'
  925 | #define GGML_F16_VEC_LOAD(p, i)     GGML_F32Cx8_LOAD(p)
      |                                     ^~~~~~~~~~~~~~~~
ggml.c:1319:21: note: in expansion of macro 'GGML_F16_VEC_LOAD'
 1319 |             ay[j] = GGML_F16_VEC_LOAD(y + i + j*GGML_F16_EPR, j);
      |                     ^~~~~~~~~~~~~~~~~
In file included from /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:113,
                 from ggml.c:155:
/usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/f16cintrin.h:52:1: error: inlining failed in call to 'always_inline' '_mm256_cvtph_ps': target specific option mismatch
   52 | _mm256_cvtph_ps (__m128i __A)
      | ^~~~~~~~~~~~~~~
ggml.c:915:33: note: called from here
  915 | #define GGML_F32Cx8_LOAD(x)     _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:925:37: note: in expansion of macro 'GGML_F32Cx8_LOAD'
  925 | #define GGML_F16_VEC_LOAD(p, i)     GGML_F32Cx8_LOAD(p)
      |                                     ^~~~~~~~~~~~~~~~
ggml.c:1318:21: note: in expansion of macro 'GGML_F16_VEC_LOAD'
 1318 |             ax[j] = GGML_F16_VEC_LOAD(x + i + j*GGML_F16_EPR, j);
      |                     ^~~~~~~~~~~~~~~~~
In file included from /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:113,
                 from ggml.c:155:
/usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/f16cintrin.h:52:1: error: inlining failed in call to 'always_inline' '_mm256_cvtph_ps': target specific option mismatch
   52 | _mm256_cvtph_ps (__m128i __A)
      | ^~~~~~~~~~~~~~~
ggml.c:915:33: note: called from here
  915 | #define GGML_F32Cx8_LOAD(x)     _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:925:37: note: in expansion of macro 'GGML_F32Cx8_LOAD'
  925 | #define GGML_F16_VEC_LOAD(p, i)     GGML_F32Cx8_LOAD(p)
      |                                     ^~~~~~~~~~~~~~~~
ggml.c:1318:21: note: in expansion of macro 'GGML_F16_VEC_LOAD'
 1318 |             ax[j] = GGML_F16_VEC_LOAD(x + i + j*GGML_F16_EPR, j);
      |                     ^~~~~~~~~~~~~~~~~
In file included from /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:113,
                 from ggml.c:155:
/usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/f16cintrin.h:52:1: error: inlining failed in call to 'always_inline' '_mm256_cvtph_ps': target specific option mismatch
   52 | _mm256_cvtph_ps (__m128i __A)
      | ^~~~~~~~~~~~~~~
ggml.c:915:33: note: called from here
  915 | #define GGML_F32Cx8_LOAD(x)     _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:925:37: note: in expansion of macro 'GGML_F32Cx8_LOAD'
  925 | #define GGML_F16_VEC_LOAD(p, i)     GGML_F32Cx8_LOAD(p)
      |                                     ^~~~~~~~~~~~~~~~
ggml.c:1319:21: note: in expansion of macro 'GGML_F16_VEC_LOAD'
 1319 |             ay[j] = GGML_F16_VEC_LOAD(y + i + j*GGML_F16_EPR, j);
      |                     ^~~~~~~~~~~~~~~~~
In file included from /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:113,
                 from ggml.c:155:
/usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/f16cintrin.h:52:1: error: inlining failed in call to 'always_inline' '_mm256_cvtph_ps': target specific option mismatch
   52 | _mm256_cvtph_ps (__m128i __A)
      | ^~~~~~~~~~~~~~~
ggml.c:915:33: note: called from here
  915 | #define GGML_F32Cx8_LOAD(x)     _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:925:37: note: in expansion of macro 'GGML_F32Cx8_LOAD'
  925 | #define GGML_F16_VEC_LOAD(p, i)     GGML_F32Cx8_LOAD(p)
      |                                     ^~~~~~~~~~~~~~~~
ggml.c:1318:21: note: in expansion of macro 'GGML_F16_VEC_LOAD'
 1318 |             ax[j] = GGML_F16_VEC_LOAD(x + i + j*GGML_F16_EPR, j);
      |                     ^~~~~~~~~~~~~~~~~
In file included from /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:113,
                 from ggml.c:155:
/usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/f16cintrin.h:52:1: error: inlining failed in call to 'always_inline' '_mm256_cvtph_ps': target specific option mismatch
   52 | _mm256_cvtph_ps (__m128i __A)
      | ^~~~~~~~~~~~~~~
ggml.c:915:33: note: called from here
  915 | #define GGML_F32Cx8_LOAD(x)     _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:925:37: note: in expansion of macro 'GGML_F32Cx8_LOAD'
  925 | #define GGML_F16_VEC_LOAD(p, i)     GGML_F32Cx8_LOAD(p)
      |                                     ^~~~~~~~~~~~~~~~
ggml.c:1319:21: note: in expansion of macro 'GGML_F16_VEC_LOAD'
 1319 |             ay[j] = GGML_F16_VEC_LOAD(y + i + j*GGML_F16_EPR, j);
      |                     ^~~~~~~~~~~~~~~~~
make: *** [Makefile:221: ggml.o] Error 1
The command '/bin/sh -c cd llama.cpp &&     make &&     mv main llama' returned a non-zero code: 2
ERROR: Service 'api' failed to build : Build failed

Mar 24 '23 16:03 willjasen

I have the same problem.

Mar 24 '23 16:03 pixelrakete

I've noticed that the errors refer to /usr/local/lib/gcc of which doesn't exist for me, but gcc and build-essentials are installed.

Mar 24 '23 16:03 willjasen

This issue seems to arise while compiling https://github.com/ggerganov/llama.cpp from source. It seems like the simple solution here would be to simply use the prebuilt Docker image from that repo as a build step as opposed to compiling from source.

Mar 25 '23 01:03 mcsgroi

I've opened a PR that should address this issue using the method outlined in my previous comment. I've run this locally with no issues since it circumvents the need for compiling entirely.

Mar 25 '23 19:03 mcsgroi

I've opened a PR that should address this issue using the method outlined in my previous comment. I've run this locally with no issues since it circumvents the need for compiling entirely.

I'm sorry for not understanding but what is done with the repository that you located in your other comment

Mar 25 '23 23:03 Hant7

Seems related to https://github.com/ggerganov/llama.cpp/issues/196 and by extension https://github.com/ggerganov/llama.cpp/discussions/535.

I tweaked the VMware EVC CPU Mode to "Haswell" in my environment and was able to complete the compilation.

Mar 27 '23 02:03 scrossle

Hey everyone! If that's still an issue, anyone willing to try out the changes in the following PR: https://github.com/nsarrazin/serge/pull/109

This should hopefully fix things, by switching to llama-rs. Just do:

git checkout feature/llama-rs-caching
docker compose up -d --build

Mar 29 '23 18:03 nsarrazin

I deployed this PR as a test and I'm still running into it on my Dell PowerEdge R620, but I believe it doesn't support the AVX instruction set so this may be moot as far as it is concerned. I can test deploy on my known working setup of Ubuntu VM on MacBook if you need.

Mar 29 '23 19:03 willjasen

@willjasen did you figured out the issue?

Apr 19 '23 02:04 gaby

serge serge copied to clipboard

error: inlining failed in call to 'always_inline' '_mm256_cvtph_ps': target specific option mismatch

serge
serge copied to clipboard