gpt4all
gpt4all copied to clipboard
Illegal intruction after running `gpt4all-lora-quantized-linux-x86`
I'm trying to run the gpt4all-lora-quantized-linux-x86 on a Ubuntu Linux machine with 240 Intel(R) Xeon(R) CPU E7-8880 v2 @ 2.50GHz processors and 295GB RAM. No GPUs installed.
Ubuntu 22.04 running on a VMWare ESXi
I get the following error: Illegal instruction
willem@ubuntu:/data/chat$ gdb -q ./gpt4all-lora-quantized-linux-x86
Reading symbols from ./gpt4all-lora-quantized-linux-x86...
(No debugging symbols found in ./gpt4all-lora-quantized-linux-x86)
(gdb) run
Starting program: /data/gpt4all/chat/gpt4all-lora-quantized-linux-x86
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
main: seed = 1680171804
llama_model_load: loading model from 'gpt4all-lora-quantized.bin' - please wait ...
llama_model_load: ggml ctx size = 6065.35 MB
llama_model_load: memory_size = 2048.00 MB, n_mem = 65536
llama_model_load: loading model part 1/1 from 'gpt4all-lora-quantized.bin'
llama_model_load: .................................... done
llama_model_load: model size = 4017.27 MB / num tensors = 291
system_info: n_threads = 4 / 240 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000
Program received signal SIGILL, Illegal instruction.
0x0000000000425282 in ggml_set_f32 ()
There are 665 instructions in that function, and there are ones that require AVX and AVX2. The instruction at 0x0000000000425282 is "vbroadcastss ymm1,xmm0" (C4 E2 7D 18 C8), and it requires AVX2. It lies just in the beginning of the function ggml_set_f32, and the only previous AVX instruction is vmovss, which requires just AVX. So likely vbroadcastss was just the first AVX2-requiring instruction that your CPU encountered.
So you need a CPU with AVX2 support to run this, as far as I can see E7-8880 v2 supports only AVX, but not AVX2. But in the output you've provided I see
AVX = 1 | AVX2 = 1
which confuses me a bit.
But in the output you've provided I see
AVX = 1 | AVX2 = 1
which confuses me a bit.
Lol, figured that out.
ggml_cpu_has_avx2 proc near
mov eax, 1
retn
ggml_cpu_has_avx2 endp
ggml_cpu_has_avx2 is basically "return true;" in the code. ggml_cpu_has_avx and ggml_cpu_has_sse3 are the same. Interesting that ggml_cpu_has_avx512 is "return false;". I.e. there are no real checks during the output of these statistics (from _Z23llama_print_system_infov, after the "system_info: " part of the line). It is decided in compile-time, and not run-time.
FWIW I ran into a similar problem running a VM under Proxmox. I was able to work around this by setting the CPU type to "host", which exposed the full instruction set (for a Ryzen 9 5900X in my case) and then it worked. ~Not sure if you can do something similar in ESXi, but thought I'd mention this in case it helps you.~ NM, I see your 240 Xeons don't have AVX2 support - bummer!
Certainly sounds like @qinidema is onto something here though. ;]
Why is AVX2 necessary anyway? Is there a workaround?
lol the gif demo has AVX = 0 | AVX2 = 0 (M1 Mac)
This probably just needs a simple recompile.
@mvrozanti you need to recompile this to get a new binary with your compile-time defines. @pirate486743186 yep.
@qinidema That fork also didn't work for me:
main: seed = 1680211596
llama_model_load: loading model from '/home/m/macrovip/gpt4all-lora-unfiltered-quantized.bin' - please wait ...
llama_model_load: ggml ctx size = 6065.35 MB
llama_model_load: memory_size = 2048.00 MB, n_mem = 65536
llama_model_load: loading model part 1/1 from '/home/m/macrovip/gpt4all-lora-unfiltered-quantized.bin'
llama_model_load: terminate called after throwing an instance of 'std::__ios_failure'
what(): basic_ios::clear: iostream error
At least I got an error with this one. What made you believe that fork in specific would work?
It's the actual source code of the project. Same error here too. I tried cmake, it didn't work either
sudo apt install libpthreadpool-dev
cmake . make chat or make -lpthread chat
/usr/bin/ld: libggml.a(ggml.c.o): in function `ggml_graph_compute':
ggml.c:(.text+0x170b0): undefined reference to `pthread_create'
/usr/bin/ld: ggml.c:(.text+0x17113): undefined reference to `pthread_join'
collect2: error: ld returned 1 exit status
gmake[2]: *** [CMakeFiles/quantize.dir/build.make:119: quantize] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:99: CMakeFiles/quantize.dir/all] Error 2
gmake: *** [Makefile:103: all] Error 2
@mvrozanti
At least I got an error with this one. What made you believe that fork in specific would work?
That line in this (current) repository.
For custom hardware compilation, see our [Alpaca C++](https://github.com/zanussbaum/gpt4all.cpp) repository.
@mvrozanti @pirate486743186 Had no problems compiling the executable with the following standard commands:
wget https://github.com/zanussbaum/gpt4all.cpp/archive/refs/heads/master.zip
unzip master.zip
cd gpt4all.cpp-master
mkdir build; cd build
cmake ..
make
Then i've received the "chat" executable (and at least is starts and shows help message successfully), as well as "quantize" and "libggml.a" library. Cmake 3.25.1, make 4.3, gcc 12.2.0, glibc 2.36, Arch Linux. Here is the full make log.
I'm on Debian 11. It's probably incompatibility with older versions. It will be easier to just get a precompiled binary.
@nomic-ai Can you please fix this first. It just needs a recompile with generic flags.
@pirate486743186 try this: no-avx2.tar.gz
Compiled it with cmake -D LLAMA_NO_AVX2=1 (static versions included too). Still requires AVX, FMA and F16C though (I can recompile without them too).
BTW AVX2 is "on by default" on non-MSVC x86 in that repo, no analyzing of actual CPU features even in compile-time. I think that needs to be corrected.
with the static compile, it gives again 'illegal instruction'. I have an old laptop.
with cmake it can't find pthread when compiling apparently.
/usr/bin/ld: libggml.a(ggml.c.o): in function `ggml_graph_compute':
ggml.c:(.text+0x16eb0): undefined reference to `pthread_create'
/usr/bin/ld: ggml.c:(.text+0x16f13): undefined reference to `pthread_join'
collect2: error: ld returned 1 exit status
make[3]: *** [CMakeFiles/chat.dir/build.make:119: chat] Error 1
make[2]: *** [CMakeFiles/Makefile2:153: CMakeFiles/chat.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:160: CMakeFiles/chat.dir/rule] Error 2
make: *** [Makefile:163: chat] Error 2
Same issue; though I get illegal instruction in
0x000055555558a1c9 in ggml_type_sizef ()
Tried compiling gpt4all.cpp myself, with and without LLAMA_NO_AVX2=1 -D LLAMA_NO_FMA=1 -D LLAMA_NO_AVX=1 flags, same issue.
for those that are frustrated, keep in mind it was released 2 days ago. Have ultra low expectations.
I have an Intel i5-3320M with no AVX2 or FMA support. I followed these steps:
https://github.com/nomic-ai/gpt4all/issues/82#issuecomment-1492001832
@mvrozanti @pirate486743186 Had no problems compiling the executable with the following standard commands:
wget https://github.com/zanussbaum/gpt4all.cpp/archive/refs/heads/master.zip unzip master.zip cd gpt4all.cpp-master mkdir build; cd build
and then
$ cmake -D LLAMA_NO_AVX2=1 -D LLAMA_NO_FMA=1 .. $ make $ ./chat -m ~/gpt4all/chat/gpt4all-lora-quantized.bin
and it worked. On my laptop, it is very slow as would be expected.
@pirate486743186 and what is the address of that instruction?
@mvrozanti you can try this one: no-avx-avx2-fma-f16c.tar.gz
Compiled it with cmake -D LLAMA_NO_AVX2=1 -D LLAMA_NO_AVX=1 -D LLAMA_NO_FMA=1 .. and additionally commented out the line with F16C.
@vsl-iil you can try the archive above too, though I cannot determine what is the instruction on your address 0x000055555558a1c9, there's some kind of heavy ASLR in your case.
It doesn't say. I have avx and f16c. These should probably work for most or even all. It worked with no-avx-avx2-fma-f16c.tar.gz (you purged everything lol).
with the static compile, it gives again 'illegal instruction'. I have an old laptop.
with cmake it can't find pthread when compiling apparently.
/usr/bin/ld: libggml.a(ggml.c.o): in function `ggml_graph_compute': ggml.c:(.text+0x16eb0): undefined reference to `pthread_create' /usr/bin/ld: ggml.c:(.text+0x16f13): undefined reference to `pthread_join' collect2: error: ld returned 1 exit status make[3]: *** [CMakeFiles/chat.dir/build.make:119: chat] Error 1 make[2]: *** [CMakeFiles/Makefile2:153: CMakeFiles/chat.dir/all] Error 2 make[1]: *** [CMakeFiles/Makefile2:160: CMakeFiles/chat.dir/rule] Error 2 make: *** [Makefile:163: chat] Error 2
Add this to CMakeLists.txt after line 25:
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -pthread")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pthread")
i opened an other issue for debian 11 #180
It compiles with this fix and the one mentioned there. But unfortunately, when it starts it gives llama_model_load: Segmentation fault
@pirate486743186 can you share your resulting compiled binary and coredump after the crash?
You can list available core dumps (it's better to start with the fresh one though, i.e. launch it again and get a fresh crash with this line with "llama_model_load") with coredumpctl list and then export it with coredumpctl -o core.dump dump 1234 where 1234 is the PID of that fresh crash.
dump 2GB, respond here #180 https://drive.google.com/file/d/1UUOae8oAerUTG9aucMYpKsa8EWs0-hXJ/view?usp=share_link
the compiled file chat.zip
I'm using this. It seams to work better. https://github.com/ggerganov/llama.cpp
to use it, you'll need to convert it first download the tokenizer file. https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer.model
run these commands adapted to your case ( they are working on a new unified converter script)
python3 convert-gpt4all-to-ggml.py models/gpt4all-7B/gpt4all-lora-quantized.bin ./models/tokenizer.model
python3 migrate-ggml-2023-03-30-pr613.py models/gpt4all-7B/gpt4all-lora-quantized.bin models/gpt4all-7B/gpt4all-lora-quantized-new.bin
Then you run it with something like this. This is a bit incorrect, you'll need to adjust the parameters for better behavior.
./main -i --interactive-first -r "### Human:" -c 2048 --temp 0.1 --ignore-eos -b 1024 -n 10 --repeat_penalty 1.2 --instruct --color -m out.bin
I am using old Macbook pro (Mid 2012 Intel Model) with 8GB RAM.
$ wget https://github.com/zanussbaum/gpt4all.cpp/archive/refs/heads/master.zip $ unzip master.zip $ cd gpt4all.cpp-master $ mkdir build; cd build $ cmake -D LLAMA_NO_AVX2=1 -D LLAMA_NO_FMA=1 .. $ make $ ./chat -m ~/gpt4all/chat/gpt4all-lora-quantized.bin
This worked for me! But very very slow! I am going to upgrade my RAM tomorrow and see if that helps!
it needs 4GB, more RAM will not help. try llama.cpp from above. It's more efficient. you'll need to convert it. For me, it takes some time to start talking every time it's its turn, but after that the tokens come at tolerably slow speed.
In the next months/year, efficiency should increase by a lot. In general, at first software is inefficient and slow.
Can you provide suggestions on how to fix this error?
davesoma@Dave:~/gpt4all_/gpt4all.cpp-master/build$ ./chat -m ~/gpt4all/chat/gpt4all-lora-quantized.bin main: seed = 1681664229 llama_model_load: loading model from '/home/davesoma/gpt4all/chat/gpt4all-lora-quantized.bin' - please wait ... llama_model_load: ggml ctx size = 6065.35 MB Segmentation fault
It happens the same to me:

I only have 4 GB of RAM. Is this the problem?
When doing cmake, I get: cmake -D LLAMA_NO_AVX2=1 -D LLAMA_NO_AVX=1 -D LLAMA_NO_FMA=1 .. -- CMAKE_SYSTEM_PROCESSOR: x86_64 -- x86 detected -- Configuring done CMake Error in CMakeLists.txt: Target "chat" requires the language dialect "CXX20" (with compiler extensions), but CMake does not know the compile flags to use to enable it. CMake Error in CMakeLists.txt: Target "quantize" requires the language dialect "CXX20" (with compiler extensions), but CMake does not know the compile flags to use to enable it. -- Generating done -- Build files have been written to: /root/master-gpt4all/gpt4all.cpp-master/build
And fails with
make
[ 12%] Building C object CMakeFiles/ggml.dir/ggml.c.o
[ 25%] Linking C static library libggml.a
[ 25%] Built target ggml
[ 37%] Linking CXX executable chat
libggml.a(ggml.c.o): In function ggml_graph_compute': ggml.c:(.text+0x1a876): undefined reference to pthread_join'
ggml.c:(.text+0x1a952): undefined reference to `pthread_create'
collect2: error: ld returned 1 exit status
CMakeFiles/chat.dir/build.make:121: recipe for target 'chat' failed
make[2]: *** [chat] Error 1
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/chat.dir/all' failed
make[1]: *** [CMakeFiles/chat.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2