exllama Run on CPU without AVX2

Hello, I have a server with Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz and 5x WX9100 and want to run Mistral 7b on each GPU. But I received an error: "Illegal instruction (core dumped)" when I tried to do it. Is it possible to run exllama on the CPU without AVX2?

Apr 14 '24 23:04 ZanMax

Are you on the latest version?

Apr 15 '24 23:04 turboderp

steps:

git clone https://github.com/turboderp/exllama cd exllama pip install -r requirements.txt python test_benchmark_inference.py -d <path_to_model_files> -p -ppl

result

python test_benchmark_inference.py -d /home/dev/models/Mistral-7B-Instruct-v0.2-GPTQ/ -p -ppl Successfully preprocessed all matching files. -- Perplexity: -- - Dataset: datasets/wikitext2_val_sample.jsonl -- - Chunks: 100 -- - Chunk size: 2048 -> 2048 -- - Chunk overlap: 0 -- - Min. chunk size: 50 -- - Key: text -- Tokenizer: /home/dev/models/Mistral-7B-Instruct-v0.2-GPTQ/tokenizer.model -- Model config: /home/dev/models/Mistral-7B-Instruct-v0.2-GPTQ/config.json -- Model: /home/dev/models/Mistral-7B-Instruct-v0.2-GPTQ/model.safetensors -- Sequence length: 2048 -- Tuning: -- --sdp_thd: 8 -- --matmul_recons_thd: 8 -- --fused_mlp_thd: 2 -- --rmsnorm_no_half2 -- --rope_no_half2 -- --matmul_no_half2 -- --silu_no_half2 -- Options: ['perf', 'perplexity'] ** Time, Load model: 21.56 seconds ** Time, Load tokenizer: 0.02 seconds -- Groupsize (inferred): 128 -- Act-order (inferred): yes ** VRAM, Model: [cuda:0] 3,877.87 MB - [cuda:1] 0.00 MB - [cuda:2] 0.00 MB - [cuda:3] 0.00 MB - [cuda:4] 0.00 MB ** VRAM, Cache: [cuda:0] 256.00 MB - [cuda:1] 0.00 MB - [cuda:2] 0.00 MB - [cuda:3] 0.00 MB - [cuda:4] 0.00 MB -- Warmup pass 1... Illegal instruction (core dumped)

As I know Illegal instruction (core dumped) means that problem with AVX2 instruction. When I tried the GGUF format with llama.cpp I received the same Illegal instruction (core dumped).

Apr 15 '24 23:04 ZanMax

Maybe this gives more information about an error:

gdb --args python3 test_benchmark_inference.py -d /home/dev/models/Mistral-7B-Instruct-v0.2-GPTQ/ -p -ppl

#0 0x00007fff4e89540e in rocblas_hgemm () from /home/dev/workspace/numpy_no_avx2/venv/lib/python3.10/site-packages/torch/lib/librocblas.so #1 0x00007fff86e491dd in hipblasHgemm () from /home/dev/workspace/numpy_no_avx2/venv/lib/python3.10/site-packages/torch/lib/libhipblas.so #2 0x00007ffe8ba50855 in q4_matmul_recons_cuda(ExLlamaTuning*, __half const*, int, Q4Matrix*, __half*, void*, bool) () from /home/dev/.cache/torch_extensions/py310_cpu/exllama_ext/exllama_ext.so #3 0x00007ffe8ba364e8 in q4_matmul(at::Tensor, unsigned long, at::Tensor) () from /home/dev/.cache/torch_extensions/py310_cpu/exllama_ext/exllama_ext.so #4 0x00007ffe8ba4e423 in pybind11::cpp_function::initialize<void (&)(at::Tensor, unsigned long, at::Tensor), void, at::Tensor, unsigned long, at::Tensor, pybind11::name, pybind11::scope, pybind11::sibling, char [10]>(void (&)(at::Tensor, unsigned long, at::Tensor), void ()(at::Tensor, unsigned long, at::Tensor), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [10])::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) () from /home/dev/.cache/torch_extensions/py310_cpu/exllama_ext/exllama_ext.so #5 0x00007ffe8ba4aa4d in pybind11::cpp_function::dispatcher(_object, _object*, _object*) () from /home/dev/.cache/torch_extensions/py310_cpu/exllama_ext/exllama_ext.so #6 0x00005555556ae10e in ?? () #7 0x00005555556a4a7b in _PyObject_MakeTpCall () #8 0x000055555569d096 in _PyEval_EvalFrameDefault () #9 0x00005555556ae9fc in _PyFunction_Vectorcall () #10 0x000055555569ccfa in _PyEval_EvalFrameDefault () #11 0x00005555556ae9fc in _PyFunction_Vectorcall () #12 0x000055555569745c in _PyEval_EvalFrameDefault () #13 0x00005555556ae9fc in _PyFunction_Vectorcall () #14 0x000055555569745c in _PyEval_EvalFrameDefault () #15 0x00005555556ae9fc in _PyFunction_Vectorcall () #16 0x000055555569745c in _PyEval_EvalFrameDefault () #17 0x00005555556ae9fc in _PyFunction_Vectorcall () #18 0x000055555569745c in _PyEval_EvalFrameDefault () #19 0x00005555556bc7f1 in ?? () #20 0x000055555569853c in _PyEval_EvalFrameDefault () #21 0x00005555556ae9fc in _PyFunction_Vectorcall () #22 0x000055555569726d in _PyEval_EvalFrameDefault () #23 0x00005555556ae9fc in _PyFunction_Vectorcall () #24 0x000055555569726d in _PyEval_EvalFrameDefault () #25 0x00005555556ae9fc in _PyFunction_Vectorcall () #26 0x000055555569726d in _PyEval_EvalFrameDefault () #27 0x00005555556939c6 in ?? () #28 0x0000555555789256 in PyEval_EvalCode () #29 0x00005555557b4108 in ?? () #30 0x00005555557ad9cb in ?? () #31 0x00005555557b3e55 in ?? () #32 0x00005555557b3338 in _PyRun_SimpleFileObject () #33 0x00005555557b2f83 in _PyRun_AnyFileObject () #34 0x00005555557a5a5e in Py_RunMain () #35 0x000055555577c02d in Py_BytesMain () #36 0x00007ffff7c7ed90 in __libc_start_call_main (main=main@entry=0x55555577bff0, argc=argc@entry=6, argv=argv@entry=0x7fffffffe328) at ../sysdeps/nptl/libc_start_call_main.h:58 #37 0x00007ffff7c7ee40 in __libc_start_main_impl (main=0x55555577bff0, argc=6, argv=0x7fffffffe328, init=, fini=, rtld_fini=, stack_end=0x7fffffffe318) at ../csu/libc-start.c:392 #38 0x000055555577bf25 in _start ()

Apr 18 '24 00:04 ZanMax

exllama exllama copied to clipboard

Run on CPU without AVX2

exllama
exllama copied to clipboard