gpt4all Illegal intruction after running `gpt4all-lora-quantized-linux-x86`

I'm trying to run the gpt4all-lora-quantized-linux-x86 on a Ubuntu Linux machine with 240 Intel(R) Xeon(R) CPU E7-8880 v2 @ 2.50GHz processors and 295GB RAM. No GPUs installed. Ubuntu 22.04 running on a VMWare ESXi

I get the following error: Illegal instruction

willem@ubuntu:/data/chat$ gdb -q ./gpt4all-lora-quantized-linux-x86
Reading symbols from ./gpt4all-lora-quantized-linux-x86...
(No debugging symbols found in ./gpt4all-lora-quantized-linux-x86)
(gdb) run
Starting program: /data/gpt4all/chat/gpt4all-lora-quantized-linux-x86
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
main: seed = 1680171804
llama_model_load: loading model from 'gpt4all-lora-quantized.bin' - please wait ...
llama_model_load: ggml ctx size = 6065.35 MB
llama_model_load: memory_size =  2048.00 MB, n_mem = 65536
llama_model_load: loading model part 1/1 from 'gpt4all-lora-quantized.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

system_info: n_threads = 4 / 240 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000



Program received signal SIGILL, Illegal instruction.
0x0000000000425282 in ggml_set_f32 ()

Mar 30 '23 10:03 WillemDeGroef

There are 665 instructions in that function, and there are ones that require AVX and AVX2. The instruction at 0x0000000000425282 is "vbroadcastss ymm1,xmm0" (C4 E2 7D 18 C8), and it requires AVX2. It lies just in the beginning of the function ggml_set_f32, and the only previous AVX instruction is vmovss, which requires just AVX. So likely vbroadcastss was just the first AVX2-requiring instruction that your CPU encountered.

So you need a CPU with AVX2 support to run this, as far as I can see E7-8880 v2 supports only AVX, but not AVX2. But in the output you've provided I see

AVX = 1 | AVX2 = 1

which confuses me a bit.

Mar 30 '23 12:03 qinidema

But in the output you've provided I see

AVX = 1 | AVX2 = 1

which confuses me a bit.

Lol, figured that out.

ggml_cpu_has_avx2 proc near
mov     eax, 1
retn
ggml_cpu_has_avx2 endp

ggml_cpu_has_avx2 is basically "return true;" in the code. ggml_cpu_has_avx and ggml_cpu_has_sse3 are the same. Interesting that ggml_cpu_has_avx512 is "return false;". I.e. there are no real checks during the output of these statistics (from _Z23llama_print_system_infov, after the "system_info: " part of the line). It is decided in compile-time, and not run-time.

Mar 30 '23 14:03 qinidema

FWIW I ran into a similar problem running a VM under Proxmox. I was able to work around this by setting the CPU type to "host", which exposed the full instruction set (for a Ryzen 9 5900X in my case) and then it worked. ~Not sure if you can do something similar in ESXi, but thought I'd mention this in case it helps you.~ NM, I see your 240 Xeons don't have AVX2 support - bummer!

Certainly sounds like @qinidema is onto something here though. ;]

Mar 30 '23 14:03 HerbCSO

Why is AVX2 necessary anyway? Is there a workaround?

Mar 30 '23 18:03 mvrozanti

lol the gif demo has AVX = 0 | AVX2 = 0 (M1 Mac)

This probably just needs a simple recompile.

Mar 30 '23 19:03 pirate486743186

@mvrozanti you need to recompile this to get a new binary with your compile-time defines. @pirate486743186 yep.

Mar 30 '23 20:03 qinidema

@qinidema That fork also didn't work for me:

main: seed = 1680211596
llama_model_load: loading model from '/home/m/macrovip/gpt4all-lora-unfiltered-quantized.bin' - please wait ...
llama_model_load: ggml ctx size = 6065.35 MB
llama_model_load: memory_size =  2048.00 MB, n_mem = 65536
llama_model_load: loading model part 1/1 from '/home/m/macrovip/gpt4all-lora-unfiltered-quantized.bin'
llama_model_load: terminate called after throwing an instance of 'std::__ios_failure'
  what():  basic_ios::clear: iostream error

At least I got an error with this one. What made you believe that fork in specific would work?

Mar 30 '23 21:03 mvrozanti

It's the actual source code of the project. Same error here too. I tried cmake, it didn't work either

sudo apt install libpthreadpool-dev cmake . make chat or make -lpthread chat

/usr/bin/ld: libggml.a(ggml.c.o): in function `ggml_graph_compute':
ggml.c:(.text+0x170b0): undefined reference to `pthread_create'
/usr/bin/ld: ggml.c:(.text+0x17113): undefined reference to `pthread_join'
collect2: error: ld returned 1 exit status
gmake[2]: *** [CMakeFiles/quantize.dir/build.make:119: quantize] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:99: CMakeFiles/quantize.dir/all] Error 2
gmake: *** [Makefile:103: all] Error 2

Mar 30 '23 22:03 pirate486743186

@mvrozanti

At least I got an error with this one. What made you believe that fork in specific would work?

That line in this (current) repository.

For custom hardware compilation, see our [Alpaca C++](https://github.com/zanussbaum/gpt4all.cpp) repository.

Mar 31 '23 08:03 qinidema

@mvrozanti @pirate486743186 Had no problems compiling the executable with the following standard commands:

wget https://github.com/zanussbaum/gpt4all.cpp/archive/refs/heads/master.zip
unzip master.zip
cd gpt4all.cpp-master
mkdir build; cd build
cmake ..
make

Then i've received the "chat" executable (and at least is starts and shows help message successfully), as well as "quantize" and "libggml.a" library. Cmake 3.25.1, make 4.3, gcc 12.2.0, glibc 2.36, Arch Linux. Here is the full make log.

Mar 31 '23 14:03 qinidema

I'm on Debian 11. It's probably incompatibility with older versions. It will be easier to just get a precompiled binary.

Mar 31 '23 15:03 pirate486743186

@nomic-ai Can you please fix this first. It just needs a recompile with generic flags.

Mar 31 '23 15:03 pirate486743186

@pirate486743186 try this: no-avx2.tar.gz Compiled it with cmake -D LLAMA_NO_AVX2=1 (static versions included too). Still requires AVX, FMA and F16C though (I can recompile without them too). BTW AVX2 is "on by default" on non-MSVC x86 in that repo, no analyzing of actual CPU features even in compile-time. I think that needs to be corrected.

Mar 31 '23 16:03 qinidema

with the static compile, it gives again 'illegal instruction'. I have an old laptop.

with cmake it can't find pthread when compiling apparently.

/usr/bin/ld: libggml.a(ggml.c.o): in function `ggml_graph_compute':
ggml.c:(.text+0x16eb0): undefined reference to `pthread_create'
/usr/bin/ld: ggml.c:(.text+0x16f13): undefined reference to `pthread_join'
collect2: error: ld returned 1 exit status
make[3]: *** [CMakeFiles/chat.dir/build.make:119: chat] Error 1
make[2]: *** [CMakeFiles/Makefile2:153: CMakeFiles/chat.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:160: CMakeFiles/chat.dir/rule] Error 2
make: *** [Makefile:163: chat] Error 2

Mar 31 '23 17:03 pirate486743186

Same issue; though I get illegal instruction in 0x000055555558a1c9 in ggml_type_sizef () Tried compiling gpt4all.cpp myself, with and without LLAMA_NO_AVX2=1 -D LLAMA_NO_FMA=1 -D LLAMA_NO_AVX=1 flags, same issue.

Mar 31 '23 18:03 vsl-iil

for those that are frustrated, keep in mind it was released 2 days ago. Have ultra low expectations.

Mar 31 '23 18:03 pirate486743186

I have an Intel i5-3320M with no AVX2 or FMA support. I followed these steps:

https://github.com/nomic-ai/gpt4all/issues/82#issuecomment-1492001832

@mvrozanti @pirate486743186 Had no problems compiling the executable with the following standard commands:
wget https://github.com/zanussbaum/gpt4all.cpp/archive/refs/heads/master.zip
unzip master.zip
cd gpt4all.cpp-master
mkdir build; cd build

and then

$ cmake -D LLAMA_NO_AVX2=1 -D LLAMA_NO_FMA=1 .. $ make $ ./chat -m ~/gpt4all/chat/gpt4all-lora-quantized.bin

and it worked. On my laptop, it is very slow as would be expected.

Apr 02 '23 14:04 martinmcmillan

@pirate486743186 and what is the address of that instruction?

@mvrozanti you can try this one: no-avx-avx2-fma-f16c.tar.gz Compiled it with cmake -D LLAMA_NO_AVX2=1 -D LLAMA_NO_AVX=1 -D LLAMA_NO_FMA=1 .. and additionally commented out the line with F16C.

@vsl-iil you can try the archive above too, though I cannot determine what is the instruction on your address 0x000055555558a1c9, there's some kind of heavy ASLR in your case.

Apr 03 '23 10:04 qinidema

It doesn't say. I have avx and f16c. These should probably work for most or even all. It worked with no-avx-avx2-fma-f16c.tar.gz (you purged everything lol).

Apr 03 '23 17:04 pirate486743186

with the static compile, it gives again 'illegal instruction'. I have an old laptop.

with cmake it can't find pthread when compiling apparently.

/usr/bin/ld: libggml.a(ggml.c.o): in function `ggml_graph_compute':
ggml.c:(.text+0x16eb0): undefined reference to `pthread_create'
/usr/bin/ld: ggml.c:(.text+0x16f13): undefined reference to `pthread_join'
collect2: error: ld returned 1 exit status
make[3]: *** [CMakeFiles/chat.dir/build.make:119: chat] Error 1
make[2]: *** [CMakeFiles/Makefile2:153: CMakeFiles/chat.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:160: CMakeFiles/chat.dir/rule] Error 2
make: *** [Makefile:163: chat] Error 2

Add this to CMakeLists.txt after line 25: set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -pthread") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pthread")

Apr 06 '23 21:04 JezausTevas

i opened an other issue for debian 11 #180

It compiles with this fix and the one mentioned there. But unfortunately, when it starts it gives llama_model_load: Segmentation fault

Apr 06 '23 22:04 pirate486743186

@pirate486743186 can you share your resulting compiled binary and coredump after the crash? You can list available core dumps (it's better to start with the fresh one though, i.e. launch it again and get a fresh crash with this line with "llama_model_load") with coredumpctl list and then export it with coredumpctl -o core.dump dump 1234 where 1234 is the PID of that fresh crash.

Apr 07 '23 10:04 qinidema

dump 2GB, respond here #180 https://drive.google.com/file/d/1UUOae8oAerUTG9aucMYpKsa8EWs0-hXJ/view?usp=share_link

the compiled file chat.zip

Apr 07 '23 17:04 pirate486743186

I'm using this. It seams to work better. https://github.com/ggerganov/llama.cpp

to use it, you'll need to convert it first download the tokenizer file. https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer.model

run these commands adapted to your case ( they are working on a new unified converter script)

python3 convert-gpt4all-to-ggml.py models/gpt4all-7B/gpt4all-lora-quantized.bin ./models/tokenizer.model 
python3 migrate-ggml-2023-03-30-pr613.py models/gpt4all-7B/gpt4all-lora-quantized.bin models/gpt4all-7B/gpt4all-lora-quantized-new.bin

Then you run it with something like this. This is a bit incorrect, you'll need to adjust the parameters for better behavior. ./main -i --interactive-first -r "### Human:" -c 2048 --temp 0.1 --ignore-eos -b 1024 -n 10 --repeat_penalty 1.2 --instruct --color -m out.bin

Apr 11 '23 18:04 pirate486743186

I am using old Macbook pro (Mid 2012 Intel Model) with 8GB RAM.

$ wget https://github.com/zanussbaum/gpt4all.cpp/archive/refs/heads/master.zip $ unzip master.zip $ cd gpt4all.cpp-master $ mkdir build; cd build $ cmake -D LLAMA_NO_AVX2=1 -D LLAMA_NO_FMA=1 .. $ make $ ./chat -m ~/gpt4all/chat/gpt4all-lora-quantized.bin

This worked for me! But very very slow! I am going to upgrade my RAM tomorrow and see if that helps!

Apr 14 '23 17:04 nehulagr

it needs 4GB, more RAM will not help. try llama.cpp from above. It's more efficient. you'll need to convert it. For me, it takes some time to start talking every time it's its turn, but after that the tokens come at tolerably slow speed.

In the next months/year, efficiency should increase by a lot. In general, at first software is inefficient and slow.

Apr 14 '23 21:04 pirate486743186

Can you provide suggestions on how to fix this error?

davesoma@Dave:~/gpt4all_/gpt4all.cpp-master/build$ ./chat -m ~/gpt4all/chat/gpt4all-lora-quantized.bin main: seed = 1681664229 llama_model_load: loading model from '/home/davesoma/gpt4all/chat/gpt4all-lora-quantized.bin' - please wait ... llama_model_load: ggml ctx size = 6065.35 MB Segmentation fault

Apr 16 '23 16:04 Dave86ch

It happens the same to me:

2023-04-16_20:07:33

I only have 4 GB of RAM. Is this the problem?

Apr 16 '23 18:04 gerardbm

When doing cmake, I get: cmake -D LLAMA_NO_AVX2=1 -D LLAMA_NO_AVX=1 -D LLAMA_NO_FMA=1 .. -- CMAKE_SYSTEM_PROCESSOR: x86_64 -- x86 detected -- Configuring done CMake Error in CMakeLists.txt: Target "chat" requires the language dialect "CXX20" (with compiler extensions), but CMake does not know the compile flags to use to enable it. CMake Error in CMakeLists.txt: Target "quantize" requires the language dialect "CXX20" (with compiler extensions), but CMake does not know the compile flags to use to enable it. -- Generating done -- Build files have been written to: /root/master-gpt4all/gpt4all.cpp-master/build

And fails with make [ 12%] Building C object CMakeFiles/ggml.dir/ggml.c.o [ 25%] Linking C static library libggml.a [ 25%] Built target ggml [ 37%] Linking CXX executable chat libggml.a(ggml.c.o): In function ggml_graph_compute': ggml.c:(.text+0x1a876): undefined reference to pthread_join' ggml.c:(.text+0x1a952): undefined reference to `pthread_create' collect2: error: ld returned 1 exit status CMakeFiles/chat.dir/build.make:121: recipe for target 'chat' failed make[2]: *** [chat] Error 1 CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/chat.dir/all' failed make[1]: *** [CMakeFiles/chat.dir/all] Error 2 Makefile:83: recipe for target 'all' failed make: *** [all] Error 2

Aug 04 '23 19:08 nitinvengurlekar

gpt4all gpt4all copied to clipboard

Illegal intruction after running `gpt4all-lora-quantized-linux-x86`

gpt4all
gpt4all copied to clipboard