llama.cpp
llama.cpp copied to clipboard
Compilation error on Nvidia Jetson Nano
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the README.md.
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
Please provide a detailed written description of what you were trying to do, and what you expected llama.cpp
to do.
Hi! I'm trying to compile llamacpp on an Nvidia Jetson Nano 2GB with CuBLAS, because I want to use the cuda cores, but I'm facing some issues with compilation.
Current Behavior
Please provide a detailed written description of what llama.cpp
did, instead.
Both make and cmake compilation methods results in various errors.
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
- Physical (or virtual) hardware you are using, e.g. for Linux:
$ lscpu
Architecture: aarch64
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Vendor ID: ARM
Model: 1
Model name: Cortex-A57
Stepping: r1p1
CPU max MHz: 1479,0000
CPU min MHz: 102,0000
BogoMIPS: 38.40
L1d cache: 32K
L1i cache: 48K
L2 cache: 2048K
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32
- Operating System, e.g. for Linux:
$ uname -a
Linux rover-NVIDIA-JETSON 4.9.337-tegra #1 SMP PREEMPT Thu Jun 8 21:19:14 PDT 2023 aarch64 aarch64 aarch64 GNU/Linux
- SDK version, e.g. for Linux:
$ python3 --version
Python 3.6.9
$ make --version
GNU Make 4.1
$ g++ --version
g++ (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
$ cmake --version
cmake version 3.28.0-rc4
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_28_22:34:44_PST_2021
Cuda compilation tools, release 10.2, V10.2.300
Build cuda_10.2_r440.TC440_70.29663091_0
Failure Information (for bugs)
Please help provide information about the failure / bug.
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
- make
- make LLAMA_CUBLAS=1
- make LLAMA_CUBLAS=1 with -arch=compute_53
- make LLAMA_CUBLAS=1 with -arch=compute_53 and removed -mcpu
- cmake .. + cmake --build . --config Release (failure)
- cmake .. -DLLAMA_CUBLAS=ON + cmake --build . --config Release
Failure Logs
$ make
I llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: aarch64
I UNAME_M: aarch64
I CFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread -mcpu=native
I CXXFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native -Wno-array-bounds -Wno-format-truncation
I NVCCFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation "
I LDFLAGS:
I CC: cc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
I CXX: g++ (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
cc -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread -mcpu=native -c ggml-quants.c -o ggml-quants.o
ggml-quants.c: In function ‘ggml_vec_dot_q2_K_q8_K’:
ggml-quants.c:403:27: error: implicit declaration of function ‘vld1q_s16_x2’; did you mean ‘vld1q_s16’? [-Werror=implicit-function-declaration]
#define ggml_vld1q_s16_x2 vld1q_s16_x2
^
ggml-quants.c:3679:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
^~~~~~~~~~~~~~~~~
ggml-quants.c:403:27: error: invalid initializer
#define ggml_vld1q_s16_x2 vld1q_s16_x2
^
ggml-quants.c:3679:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
^~~~~~~~~~~~~~~~~
ggml-quants.c:3680:41: warning: missing braces around initializer [-Wmissing-braces]
const ggml_int16x8x2_t mins16 = {vreinterpretq_s16_u16(vmovl_u8(vget_low_u8(mins))), vreinterpretq_s16_u16(vmovl_u8(vget_high_u8(mins)))};
^
{ }
ggml-quants.c:404:27: error: implicit declaration of function ‘vld1q_u8_x2’; did you mean ‘vld1q_u32’? [-Werror=implicit-function-declaration]
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
ggml-quants.c:3716:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
const ggml_uint8x16x2_t q2bits = ggml_vld1q_u8_x2(q2); q2 += 32;
^~~~~~~~~~~~~~~~
ggml-quants.c:404:27: error: invalid initializer
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
ggml-quants.c:3716:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
const ggml_uint8x16x2_t q2bits = ggml_vld1q_u8_x2(q2); q2 += 32;
^~~~~~~~~~~~~~~~
ggml-quants.c:406:27: error: implicit declaration of function ‘vld1q_s8_x2’; did you mean ‘vld1q_s32’? [-Werror=implicit-function-declaration]
#define ggml_vld1q_s8_x2 vld1q_s8_x2
^
ggml-quants.c:3718:40: note: in expansion of macro ‘ggml_vld1q_s8_x2’
ggml_int8x16x2_t q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
^~~~~~~~~~~~~~~~
ggml-quants.c:406:27: error: invalid initializer
#define ggml_vld1q_s8_x2 vld1q_s8_x2
^
ggml-quants.c:3718:40: note: in expansion of macro ‘ggml_vld1q_s8_x2’
ggml_int8x16x2_t q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
^~~~~~~~~~~~~~~~
ggml-quants.c:3708:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
^
ggml-quants.c:3723:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
SHIFT_MULTIPLY_ACCUM_WITH_SCALE(2, 2);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml-quants.c:3708:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
^
ggml-quants.c:3725:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
SHIFT_MULTIPLY_ACCUM_WITH_SCALE(4, 4);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml-quants.c:3708:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
^
ggml-quants.c:3727:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
SHIFT_MULTIPLY_ACCUM_WITH_SCALE(6, 6);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml-quants.c: In function ‘ggml_vec_dot_q3_K_q8_K’:
ggml-quants.c:404:27: error: invalid initializer
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
ggml-quants.c:4353:36: note: in expansion of macro ‘ggml_vld1q_u8_x2’
ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh);
^~~~~~~~~~~~~~~~
ggml-quants.c:404:27: error: invalid initializer
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
ggml-quants.c:4371:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
const ggml_uint8x16x2_t q3bits = ggml_vld1q_u8_x2(q3); q3 += 32;
^~~~~~~~~~~~~~~~
ggml-quants.c:407:27: error: implicit declaration of function ‘vld1q_s8_x4’; did you mean ‘vld1q_s64’? [-Werror=implicit-function-declaration]
#define ggml_vld1q_s8_x4 vld1q_s8_x4
^
ggml-quants.c:4372:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
const ggml_int8x16x4_t q8bytes_1 = ggml_vld1q_s8_x4(q8); q8 += 64;
^~~~~~~~~~~~~~~~
ggml-quants.c:407:27: error: invalid initializer
#define ggml_vld1q_s8_x4 vld1q_s8_x4
^
ggml-quants.c:4372:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
const ggml_int8x16x4_t q8bytes_1 = ggml_vld1q_s8_x4(q8); q8 += 64;
^~~~~~~~~~~~~~~~
ggml-quants.c:407:27: error: invalid initializer
#define ggml_vld1q_s8_x4 vld1q_s8_x4
^
ggml-quants.c:4373:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
const ggml_int8x16x4_t q8bytes_2 = ggml_vld1q_s8_x4(q8); q8 += 64;
^~~~~~~~~~~~~~~~
ggml-quants.c: In function ‘ggml_vec_dot_q4_K_q8_K’:
ggml-quants.c:404:27: error: invalid initializer
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
ggml-quants.c:5273:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
const ggml_uint8x16x2_t q4bits = ggml_vld1q_u8_x2(q4); q4 += 32;
^~~~~~~~~~~~~~~~
ggml-quants.c:5291:21: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
^
ggml-quants.c:5300:21: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
^
ggml-quants.c: In function ‘ggml_vec_dot_q5_K_q8_K’:
ggml-quants.c:404:27: error: invalid initializer
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
ggml-quants.c:5918:36: note: in expansion of macro ‘ggml_vld1q_u8_x2’
ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh);
^~~~~~~~~~~~~~~~
ggml-quants.c:404:27: error: invalid initializer
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
ggml-quants.c:5926:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
const ggml_uint8x16x2_t q5bits = ggml_vld1q_u8_x2(q5); q5 += 32;
^~~~~~~~~~~~~~~~
ggml-quants.c:407:27: error: invalid initializer
#define ggml_vld1q_s8_x4 vld1q_s8_x4
^
ggml-quants.c:5927:46: note: in expansion of macro ‘ggml_vld1q_s8_x4’
const ggml_int8x16x4_t q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
^~~~~~~~~~~~~~~~
ggml-quants.c: In function ‘ggml_vec_dot_q6_K_q8_K’:
ggml-quants.c:403:27: error: invalid initializer
#define ggml_vld1q_s16_x2 vld1q_s16_x2
^
ggml-quants.c:6627:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
^~~~~~~~~~~~~~~~~
ggml-quants.c:6629:43: warning: missing braces around initializer [-Wmissing-braces]
const ggml_int16x8x2_t q6scales = {vmovl_s8(vget_low_s8(scales)), vmovl_s8(vget_high_s8(scales))};
^
{ }
ggml-quants.c:404:27: error: invalid initializer
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
ggml-quants.c:6641:40: note: in expansion of macro ‘ggml_vld1q_u8_x2’
ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh); qh += 32;
^~~~~~~~~~~~~~~~
ggml-quants.c:405:27: error: implicit declaration of function ‘vld1q_u8_x4’; did you mean ‘vld1q_u64’? [-Werror=implicit-function-declaration]
#define ggml_vld1q_u8_x4 vld1q_u8_x4
^
ggml-quants.c:6642:40: note: in expansion of macro ‘ggml_vld1q_u8_x4’
ggml_uint8x16x4_t q6bits = ggml_vld1q_u8_x4(q6); q6 += 64;
^~~~~~~~~~~~~~~~
ggml-quants.c:405:27: error: invalid initializer
#define ggml_vld1q_u8_x4 vld1q_u8_x4
^
ggml-quants.c:6642:40: note: in expansion of macro ‘ggml_vld1q_u8_x4’
ggml_uint8x16x4_t q6bits = ggml_vld1q_u8_x4(q6); q6 += 64;
^~~~~~~~~~~~~~~~
ggml-quants.c:407:27: error: invalid initializer
#define ggml_vld1q_s8_x4 vld1q_s8_x4
^
ggml-quants.c:6643:40: note: in expansion of macro ‘ggml_vld1q_s8_x4’
ggml_int8x16x4_t q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
^~~~~~~~~~~~~~~~
ggml-quants.c:6686:21: error: incompatible types when assigning to type ‘int8x16x4_t {aka struct int8x16x4_t}’ from type ‘int’
q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
^
cc1: some warnings being treated as errors
Makefile:542: recipe for target 'ggml-quants.o' failed
make: *** [ggml-quants.o] Error 1
$ make LLAMA_CUBLAS=1
I llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: aarch64
I UNAME_M: aarch64
I CFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread -mcpu=native
I CXXFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native -Wno-array-bounds -Wno-format-truncation
I NVCCFLAGS: --forward-unknown-to-host-compiler -use_fast_math -arch=native -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation "
I LDFLAGS: -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
I CC: cc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
I CXX: g++ (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
nvcc --forward-unknown-to-host-compiler -use_fast_math -arch=native -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation " -c ggml-cuda.cu -o ggml-cuda.o
nvcc fatal : Value 'native' is not defined for option 'gpu-architecture'
Makefile:439: recipe for target 'ggml-cuda.o' failed
make: *** [ggml-cuda.o] Error 1
Setting the -arch to compute_53
in the Makefile
make LLAMA_CUBLAS=1
I llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: aarch64
I UNAME_M: aarch64
I CFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread -mcpu=native
I CXXFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native -Wno-array-bounds -Wno-format-truncation
I NVCCFLAGS: --forward-unknown-to-host-compiler -use_fast_math -arch=compute_53 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation "
I LDFLAGS: -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
I CC: cc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
I CXX: g++ (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
nvcc --forward-unknown-to-host-compiler -use_fast_math -arch=compute_53 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation " -c ggml-cuda.cu -o ggml-cuda.o
nvcc fatal : 'cpu=native': expected a number
Makefile:439: recipe for target 'ggml-cuda.o' failed
make: *** [ggml-cuda.o] Error 1
I could not find what number I had to insert here for the cpu variable, and removing it also fails.
make LLAMA_CUBLAS=1
I llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: aarch64
I UNAME_M: aarch64
I CFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread
I CXXFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -Wno-array-bounds -Wno-format-truncation
I NVCCFLAGS: --forward-unknown-to-host-compiler -use_fast_math -arch=compute_53 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation "
I LDFLAGS: -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
I CC: cc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
I CXX: g++ (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
cc -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread -c ggml.c -o ggml.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -Wno-array-bounds -Wno-format-truncation -c llama.cpp -o llama.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -Wno-array-bounds -Wno-format-truncation -c common/common.cpp -o common.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -Wno-array-bounds -Wno-format-truncation -c common/sampling.cpp -o sampling.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -Wno-array-bounds -Wno-format-truncation -c common/grammar-parser.cpp -o grammar-parser.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -Wno-array-bounds -Wno-format-truncation -c common/build-info.cpp -o build-info.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -Wno-array-bounds -Wno-format-truncation -c common/console.cpp -o console.o
nvcc --forward-unknown-to-host-compiler -use_fast_math -arch=compute_53 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation " -c ggml-cuda.cu -o ggml-cuda.o
ggml-cuda.cu(5933): error: identifier "CUBLAS_TF32_TENSOR_OP_MATH" is undefined
ggml-cuda.cu(6578): error: identifier "CUBLAS_COMPUTE_16F" is undefined
ggml-cuda.cu(7514): error: identifier "CUBLAS_COMPUTE_16F" is undefined
ggml-cuda.cu(7548): error: identifier "CUBLAS_COMPUTE_16F" is undefined
4 errors detected in the compilation of "/tmp/tmpxft_00001bc3_00000000-6_ggml-cuda.cpp1.ii".
Makefile:439: recipe for target 'ggml-cuda.o' failed
make: *** [ggml-cuda.o] Error 1
$ cmake ..
-- cuBLAS found
-- Using CUDA architectures: 53
GNU ld (GNU Binutils for Ubuntu) 2.30
-- CMAKE_SYSTEM_PROCESSOR: aarch64
-- ARM detected
-- Configuring done (0.4s)
-- Generating done (0.9s)
-- Build files have been written to: /home/rover/llama.cpp/build
cmake --build . --config Release
[ 1%] Building C object CMakeFiles/ggml.dir/ggml-quants.c.o
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q2_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:403:27: error: implicit declaration of function ‘vld1q_s16_x2’; did you mean ‘vld1q_s16’? [-Werror=implicit-function-declaration]
#define ggml_vld1q_s16_x2 vld1q_s16_x2
^
/home/rover/llama.cpp/ggml-quants.c:3679:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
^~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:403:27: error: invalid initializer
#define ggml_vld1q_s16_x2 vld1q_s16_x2
^
/home/rover/llama.cpp/ggml-quants.c:3679:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
^~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:3680:41: warning: missing braces around initializer [-Wmissing-braces]
const ggml_int16x8x2_t mins16 = {vreinterpretq_s16_u16(vmovl_u8(vget_low_u8(mins))), vreinterpretq_s16_u16(vmovl_u8(vget_high_u8(mins)))};
^
{ }
/home/rover/llama.cpp/ggml-quants.c:404:27: error: implicit declaration of function ‘vld1q_u8_x2’; did you mean ‘vld1q_u32’? [-Werror=implicit-function-declaration]
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
/home/rover/llama.cpp/ggml-quants.c:3716:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
const ggml_uint8x16x2_t q2bits = ggml_vld1q_u8_x2(q2); q2 += 32;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
/home/rover/llama.cpp/ggml-quants.c:3716:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
const ggml_uint8x16x2_t q2bits = ggml_vld1q_u8_x2(q2); q2 += 32;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:406:27: error: implicit declaration of function ‘vld1q_s8_x2’; did you mean ‘vld1q_s32’? [-Werror=implicit-function-declaration]
#define ggml_vld1q_s8_x2 vld1q_s8_x2
^
/home/rover/llama.cpp/ggml-quants.c:3718:40: note: in expansion of macro ‘ggml_vld1q_s8_x2’
ggml_int8x16x2_t q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:406:27: error: invalid initializer
#define ggml_vld1q_s8_x2 vld1q_s8_x2
^
/home/rover/llama.cpp/ggml-quants.c:3718:40: note: in expansion of macro ‘ggml_vld1q_s8_x2’
ggml_int8x16x2_t q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:3708:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
^
/home/rover/llama.cpp/ggml-quants.c:3723:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
SHIFT_MULTIPLY_ACCUM_WITH_SCALE(2, 2);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:3708:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
^
/home/rover/llama.cpp/ggml-quants.c:3725:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
SHIFT_MULTIPLY_ACCUM_WITH_SCALE(4, 4);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:3708:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
^
/home/rover/llama.cpp/ggml-quants.c:3727:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
SHIFT_MULTIPLY_ACCUM_WITH_SCALE(6, 6);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q3_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
/home/rover/llama.cpp/ggml-quants.c:4353:36: note: in expansion of macro ‘ggml_vld1q_u8_x2’
ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh);
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
/home/rover/llama.cpp/ggml-quants.c:4371:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
const ggml_uint8x16x2_t q3bits = ggml_vld1q_u8_x2(q3); q3 += 32;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: implicit declaration of function ‘vld1q_s8_x4’; did you mean ‘vld1q_s64’? [-Werror=implicit-function-declaration]
#define ggml_vld1q_s8_x4 vld1q_s8_x4
^
/home/rover/llama.cpp/ggml-quants.c:4372:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
const ggml_int8x16x4_t q8bytes_1 = ggml_vld1q_s8_x4(q8); q8 += 64;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: invalid initializer
#define ggml_vld1q_s8_x4 vld1q_s8_x4
^
/home/rover/llama.cpp/ggml-quants.c:4372:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
const ggml_int8x16x4_t q8bytes_1 = ggml_vld1q_s8_x4(q8); q8 += 64;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: invalid initializer
#define ggml_vld1q_s8_x4 vld1q_s8_x4
^
/home/rover/llama.cpp/ggml-quants.c:4373:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
const ggml_int8x16x4_t q8bytes_2 = ggml_vld1q_s8_x4(q8); q8 += 64;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q4_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
/home/rover/llama.cpp/ggml-quants.c:5273:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
const ggml_uint8x16x2_t q4bits = ggml_vld1q_u8_x2(q4); q4 += 32;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:5291:21: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
^
/home/rover/llama.cpp/ggml-quants.c:5300:21: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
^
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q5_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
/home/rover/llama.cpp/ggml-quants.c:5918:36: note: in expansion of macro ‘ggml_vld1q_u8_x2’
ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh);
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
/home/rover/llama.cpp/ggml-quants.c:5926:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
const ggml_uint8x16x2_t q5bits = ggml_vld1q_u8_x2(q5); q5 += 32;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: invalid initializer
#define ggml_vld1q_s8_x4 vld1q_s8_x4
^
/home/rover/llama.cpp/ggml-quants.c:5927:46: note: in expansion of macro ‘ggml_vld1q_s8_x4’
const ggml_int8x16x4_t q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q6_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:403:27: error: invalid initializer
#define ggml_vld1q_s16_x2 vld1q_s16_x2
^
/home/rover/llama.cpp/ggml-quants.c:6627:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
^~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:6629:43: warning: missing braces around initializer [-Wmissing-braces]
const ggml_int16x8x2_t q6scales = {vmovl_s8(vget_low_s8(scales)), vmovl_s8(vget_high_s8(scales))};
^
{ }
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
/home/rover/llama.cpp/ggml-quants.c:6641:40: note: in expansion of macro ‘ggml_vld1q_u8_x2’
ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh); qh += 32;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:405:27: error: implicit declaration of function ‘vld1q_u8_x4’; did you mean ‘vld1q_u64’? [-Werror=implicit-function-declaration]
#define ggml_vld1q_u8_x4 vld1q_u8_x4
^
/home/rover/llama.cpp/ggml-quants.c:6642:40: note: in expansion of macro ‘ggml_vld1q_u8_x4’
ggml_uint8x16x4_t q6bits = ggml_vld1q_u8_x4(q6); q6 += 64;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:405:27: error: invalid initializer
#define ggml_vld1q_u8_x4 vld1q_u8_x4
^
/home/rover/llama.cpp/ggml-quants.c:6642:40: note: in expansion of macro ‘ggml_vld1q_u8_x4’
ggml_uint8x16x4_t q6bits = ggml_vld1q_u8_x4(q6); q6 += 64;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: invalid initializer
#define ggml_vld1q_s8_x4 vld1q_s8_x4
^
/home/rover/llama.cpp/ggml-quants.c:6643:40: note: in expansion of macro ‘ggml_vld1q_s8_x4’
ggml_int8x16x4_t q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:6686:21: error: incompatible types when assigning to type ‘int8x16x4_t {aka struct int8x16x4_t}’ from type ‘int’
q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
^
cc1: some warnings being treated as errors
CMakeFiles/ggml.dir/build.make:117: recipe for target 'CMakeFiles/ggml.dir/ggml-quants.c.o' failed
make[2]: *** [CMakeFiles/ggml.dir/ggml-quants.c.o] Error 1
CMakeFiles/Makefile2:623: recipe for target 'CMakeFiles/ggml.dir/all' failed
make[1]: *** [CMakeFiles/ggml.dir/all] Error 2
Makefile:145: recipe for target 'all' failed
make: *** [all] Error 2
$ cmake .. -DLLAMA_CUBLAS=ON
-- cuBLAS found
-- Using CUDA architectures: 53
GNU ld (GNU Binutils for Ubuntu) 2.30
-- CMAKE_SYSTEM_PROCESSOR: aarch64
-- ARM detected
-- Configuring done (0.3s)
-- Generating done (0.3s)
-- Build files have been written to: /home/rover/llama.cpp/build
$ cmake --build . --config Release
[ 1%] Building C object CMakeFiles/ggml.dir/ggml-quants.c.o
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q2_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:403:27: error: implicit declaration of function ‘vld1q_s16_x2’; did you mean ‘vld1q_s16’? [-Werror=implicit-function-declaration]
#define ggml_vld1q_s16_x2 vld1q_s16_x2
^
/home/rover/llama.cpp/ggml-quants.c:3679:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
^~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:403:27: error: invalid initializer
#define ggml_vld1q_s16_x2 vld1q_s16_x2
^
/home/rover/llama.cpp/ggml-quants.c:3679:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
^~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:3680:41: warning: missing braces around initializer [-Wmissing-braces]
const ggml_int16x8x2_t mins16 = {vreinterpretq_s16_u16(vmovl_u8(vget_low_u8(mins))), vreinterpretq_s16_u16(vmovl_u8(vget_high_u8(mins)))};
^
{ }
/home/rover/llama.cpp/ggml-quants.c:404:27: error: implicit declaration of function ‘vld1q_u8_x2’; did you mean ‘vld1q_u32’? [-Werror=implicit-function-declaration]
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
/home/rover/llama.cpp/ggml-quants.c:3716:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
const ggml_uint8x16x2_t q2bits = ggml_vld1q_u8_x2(q2); q2 += 32;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
/home/rover/llama.cpp/ggml-quants.c:3716:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
const ggml_uint8x16x2_t q2bits = ggml_vld1q_u8_x2(q2); q2 += 32;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:406:27: error: implicit declaration of function ‘vld1q_s8_x2’; did you mean ‘vld1q_s32’? [-Werror=implicit-function-declaration]
#define ggml_vld1q_s8_x2 vld1q_s8_x2
^
/home/rover/llama.cpp/ggml-quants.c:3718:40: note: in expansion of macro ‘ggml_vld1q_s8_x2’
ggml_int8x16x2_t q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:406:27: error: invalid initializer
#define ggml_vld1q_s8_x2 vld1q_s8_x2
^
/home/rover/llama.cpp/ggml-quants.c:3718:40: note: in expansion of macro ‘ggml_vld1q_s8_x2’
ggml_int8x16x2_t q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:3708:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
^
/home/rover/llama.cpp/ggml-quants.c:3723:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
SHIFT_MULTIPLY_ACCUM_WITH_SCALE(2, 2);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:3708:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
^
/home/rover/llama.cpp/ggml-quants.c:3725:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
SHIFT_MULTIPLY_ACCUM_WITH_SCALE(4, 4);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:3708:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
^
/home/rover/llama.cpp/ggml-quants.c:3727:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
SHIFT_MULTIPLY_ACCUM_WITH_SCALE(6, 6);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q3_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
/home/rover/llama.cpp/ggml-quants.c:4353:36: note: in expansion of macro ‘ggml_vld1q_u8_x2’
ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh);
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
/home/rover/llama.cpp/ggml-quants.c:4371:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
const ggml_uint8x16x2_t q3bits = ggml_vld1q_u8_x2(q3); q3 += 32;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: implicit declaration of function ‘vld1q_s8_x4’; did you mean ‘vld1q_s64’? [-Werror=implicit-function-declaration]
#define ggml_vld1q_s8_x4 vld1q_s8_x4
^
/home/rover/llama.cpp/ggml-quants.c:4372:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
const ggml_int8x16x4_t q8bytes_1 = ggml_vld1q_s8_x4(q8); q8 += 64;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: invalid initializer
#define ggml_vld1q_s8_x4 vld1q_s8_x4
^
/home/rover/llama.cpp/ggml-quants.c:4372:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
const ggml_int8x16x4_t q8bytes_1 = ggml_vld1q_s8_x4(q8); q8 += 64;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: invalid initializer
#define ggml_vld1q_s8_x4 vld1q_s8_x4
^
/home/rover/llama.cpp/ggml-quants.c:4373:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
const ggml_int8x16x4_t q8bytes_2 = ggml_vld1q_s8_x4(q8); q8 += 64;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q4_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
/home/rover/llama.cpp/ggml-quants.c:5273:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
const ggml_uint8x16x2_t q4bits = ggml_vld1q_u8_x2(q4); q4 += 32;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:5291:21: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
^
/home/rover/llama.cpp/ggml-quants.c:5300:21: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
^
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q5_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
/home/rover/llama.cpp/ggml-quants.c:5918:36: note: in expansion of macro ‘ggml_vld1q_u8_x2’
ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh);
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
/home/rover/llama.cpp/ggml-quants.c:5926:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
const ggml_uint8x16x2_t q5bits = ggml_vld1q_u8_x2(q5); q5 += 32;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: invalid initializer
#define ggml_vld1q_s8_x4 vld1q_s8_x4
^
/home/rover/llama.cpp/ggml-quants.c:5927:46: note: in expansion of macro ‘ggml_vld1q_s8_x4’
const ggml_int8x16x4_t q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q6_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:403:27: error: invalid initializer
#define ggml_vld1q_s16_x2 vld1q_s16_x2
^
/home/rover/llama.cpp/ggml-quants.c:6627:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
^~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:6629:43: warning: missing braces around initializer [-Wmissing-braces]
const ggml_int16x8x2_t q6scales = {vmovl_s8(vget_low_s8(scales)), vmovl_s8(vget_high_s8(scales))};
^
{ }
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
#define ggml_vld1q_u8_x2 vld1q_u8_x2
^
/home/rover/llama.cpp/ggml-quants.c:6641:40: note: in expansion of macro ‘ggml_vld1q_u8_x2’
ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh); qh += 32;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:405:27: error: implicit declaration of function ‘vld1q_u8_x4’; did you mean ‘vld1q_u64’? [-Werror=implicit-function-declaration]
#define ggml_vld1q_u8_x4 vld1q_u8_x4
^
/home/rover/llama.cpp/ggml-quants.c:6642:40: note: in expansion of macro ‘ggml_vld1q_u8_x4’
ggml_uint8x16x4_t q6bits = ggml_vld1q_u8_x4(q6); q6 += 64;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:405:27: error: invalid initializer
#define ggml_vld1q_u8_x4 vld1q_u8_x4
^
/home/rover/llama.cpp/ggml-quants.c:6642:40: note: in expansion of macro ‘ggml_vld1q_u8_x4’
ggml_uint8x16x4_t q6bits = ggml_vld1q_u8_x4(q6); q6 += 64;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: invalid initializer
#define ggml_vld1q_s8_x4 vld1q_s8_x4
^
/home/rover/llama.cpp/ggml-quants.c:6643:40: note: in expansion of macro ‘ggml_vld1q_s8_x4’
ggml_int8x16x4_t q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:6686:21: error: incompatible types when assigning to type ‘int8x16x4_t {aka struct int8x16x4_t}’ from type ‘int’
q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
^
cc1: some warnings being treated as errors
CMakeFiles/ggml.dir/build.make:117: recipe for target 'CMakeFiles/ggml.dir/ggml-quants.c.o' failed
make[2]: *** [CMakeFiles/ggml.dir/ggml-quants.c.o] Error 1
CMakeFiles/Makefile2:623: recipe for target 'CMakeFiles/ggml.dir/all' failed
make[1]: *** [CMakeFiles/ggml.dir/all] Error 2
Makefile:145: recipe for target 'all' failed
make: *** [all] Error 2
Similar error to #3880,
I'm not sure what more to do, or if it is even supported, because the Nvidia Jetson Nano is running Ubuntu 18.04 with cuda 10.2, which is older but I cannot upgrade. Would really appreciate if someone could help me figure this out, if I need to provide more information let me know!
What kind of Jetson board are you using ? I might have some Jetsons at work to try on, but I want to check if your and my boards have the same hardware
I'm using this board: https://developer.nvidia.com/embedded/jetson-nano-developer-kit
I would very much appreciate the help!
i'm in this same boat right now with the latest cuda-10.2 (from apt install nvidia-jetpack) the files are all present in /usr/local/cuda-10.2/ but getting the same errors.
I contemplated just doing the ubuntu repository driver installs instead of the jetpack specific variants just to see if it's functional even.
I'm using this board: https://developer.nvidia.com/embedded/jetson-nano-developer-kit
I would very much appreciate the help!
I'm pretty sure I have exact same ones at work, so I'm gonna try to borrow one, might take couple of days.
Potentially related?: https://github.com/ggerganov/llama.cpp/issues/4123
EDIT: I've run it with the patch file from this issue, but get some new errors, which I don't know how to solve. Did anyone else make some progress?
make LLAMA_CUBLAS=1
I llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: aarch64
I UNAME_M: aarch64
I CFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread -mcpu=native
I CXXFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native -Wno-array-bounds -Wno-format-truncation
I NVCCFLAGS: --compiler-options=" " -use_fast_math -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128
I LDFLAGS: -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
I CC: cc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
I CXX: g++ (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
cc -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread -mcpu=native -c ggml.c -o ggml.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native -Wno-array-bounds -Wno-format-truncation -c llama.cpp -o llama.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native -Wno-array-bounds -Wno-format-truncation -c common/common.cpp -o common.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native -Wno-array-bounds -Wno-format-truncation -c common/sampling.cpp -o sampling.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native -Wno-array-bounds -Wno-format-truncation -c common/grammar-parser.cpp -o grammar-parser.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native -Wno-array-bounds -Wno-format-truncation -c common/build-info.cpp -o build-info.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native -Wno-array-bounds -Wno-format-truncation -c common/console.cpp -o console.o
nvcc --compiler-options=" " -use_fast_math -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -c ggml-cuda.cu -o ggml-cuda.o
ggml-cuda.cu(5970): error: identifier "CUBLAS_TF32_TENSOR_OP_MATH" is undefined
ggml-cuda.cu(6617): error: identifier "CUBLAS_COMPUTE_16F" is undefined
ggml-cuda.cu(7552): error: identifier "CUBLAS_COMPUTE_16F" is undefined
ggml-cuda.cu(7586): error: identifier "CUBLAS_COMPUTE_16F" is undefined
4 errors detected in the compilation of "/tmp/tmpxft_00002e62_00000000-6_ggml-cuda.cpp1.ii".
Makefile:440: recipe for target 'ggml-cuda.o' failed
make: *** [ggml-cuda.o] Error 1
@staviq any luck so far on your end?
@rvandernoort nothing conclusive, installing newer cuda (which would solve the problem) on jetson is hacky and unstable
Supposedly Nvidia will support more recent cuda in the next release of their jetson software bundle
You need to provide CUDA_DOCKER_ARCH
to make it work. E.g., make LLAMA_CUBLAS=1 CUDA_DOCKER_ARCH=sm_87 LLAMA_CUDA_F16=1 -j 10
for Jetson Orin or make LLAMA_CUBLAS=1 CUDA_DOCKER_ARCH=sm_72 LLAMA_CUDA_F16=1 -j 10
for Jetson Xavier.
You need to provide
CUDA_DOCKER_ARCH
to make it work. E.g.,make LLAMA_CUBLAS=1 CUDA_DOCKER_ARCH=sm_87 LLAMA_CUDA_F16=1 -j 10
for Jetson Orin ormake LLAMA_CUBLAS=1 CUDA_DOCKER_ARCH=sm_72 LLAMA_CUDA_F16=1 -j 10
for Jetson Xavier.
This does not work on my Jetson Xavier, I'm still getting error: identifier "CUBLAS_COMPUTE_16F" is undefined
.
@eternitybt are you using CUDA 10? If so, updating it should fix the problem (at least it's now working on my Xavier NX 16GB).
@vvsotnikov I am indeed still on CUDA 10. It's great to hear that upgrading to CUDA 11 is possible on the Xavier NX from someone who has actually done it! Maybe you have a link to the instructions that worked for you? I find a lot of conflicting information on the internet and am afraid of trying out anything because I do not want to mess up my system.
@eternitybt I haven't updated it per se, I just installed JetPack 5.1.2 from scratch, it includes CUDA 11. If you could afford wiping your Jetson, that's probably the easiest solution
I can confirm that after updating to JetPack 5.1.2 and using the command make LLAMA_CUBLAS=1 CUDA_DOCKER_ARCH=sm_72 LLAMA_CUDA_F16=1 -j 10
provided by @vvsotnikov I can successfully compile and run the newest version of llama.cpp on my Jetson Xavier NX. Big thanks!!
Compiling with gcc-10 and g++-10 is successful, which is the highest available version on Jetson nano developer kit.
https://github.com/ggerganov/llama.cpp/issues/4123#issuecomment-1865614568 check this,I solved the compilation problem.
Thanks for all the suggestions, and to conclude how to install on an Nvidia Jetson Nano with ubuntu 18, cuda10.2
- Compile and install gcc 8.5 from source and export it to CC and CCX
- Remove this line
#MK_CXXFLAGS += -mcpu=native
from Makefile - Run
make LLAMA_CUBLAS=1 CUDA_DOCKER_ARCH=sm_52
@rvandernoort
Why do we need to compile directly from the source code of GCC?
I have similar issue in ggml and I can install gcc-8 and gcc-9 from apt-get by
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt update
sudo apt install gcc-9 g++-9
and to build ggml
cmake -DCMAKE_C_COMPILER=/usr/bin/gcc-9 -DCMAKE_CXX_COMPILER=/usr/bin/g++-9 ..
I hope this can help others who encounter the same issue on the Jetson Nano.
@rvandernoort
Why do we need to compile directly from the source code of GCC?
I have similar issue in ggml and I can install gcc-8 and gcc-9 from apt-get by
sudo add-apt-repository ppa:ubuntu-toolchain-r/test sudo apt update sudo apt install gcc-9 g++-9
and to build ggml
cmake -DCMAKE_C_COMPILER=/usr/bin/gcc-9 -DCMAKE_CXX_COMPILER=/usr/bin/g++-9 ..
I hope this can help others who encounter the same issue on the Jetson Nano.
There is a version restriction on using gcc from jetpack4 to cuda10, and in tx2 I can only use gcc8 and below, and the Ubuntu source gcc lacks these arm functions, so you need to build from source
@rvandernoort What kind of performance do you get out of this?
In my case running it without GPU offloading is actually faster than with it. Do you observe a similar pattern for your Jetson Nano?
@rvandernoort What kind of performance do you get out of this?
In my case running it without GPU offloading is actually faster than with it. Do you observe a similar pattern for your Jetson Nano?
I've done a small test to confirm your hypothesis, but for me the loading time of the model is similar and the GPU inference on a 4-bit quantized model is 3-4 times faster compared to the CPU inference using the server.
An issue I did encounter was very slow behaviour when you are closing the RAM max, so maybe model size is the issue?
@rvandernoort What kind of performance do you get out of this? In my case running it without GPU offloading is actually faster than with it. Do you observe a similar pattern for your Jetson Nano?
I've done a small test to confirm your hypothesis, but for me the loading time of the model is similar and the GPU inference on a 4-bit quantized model is 3-4 times faster compared to the CPU inference using the server.
An issue I did encounter was very slow behaviour when you are closing the RAM max, so maybe model size is the issue?
I just read your comment sorry. I am running Phi-2 so it should not be a problem with RAM. weird. :/
Closing this as my project with the jetson has finished and I managed to compile.
If anyone else needs to run llama.cpp on a Jetson Nano, I compiled all the advice listed here needed to get it running into a github gist.
Have fun!