BitNet arm device compile error.

BitNet/3rdparty/llama.cpp/ggml/src/../../../../include/bitnet-lut-kernels.h:190:43: error: cannot convert ‘int16x8_t’ to ‘const int8x16_t’ in initialization 190 | const int8x16_t vec_zero = vdupq_n_s16(0x0000); | ~~~~~~~~~~~^~~~~~~~ | | | int16x8_t

BitNet/3rdparty/llama.cpp/ggml/src/../../../../include/bitnet-lut-kernels.h:232:22: error: invalid operands to binary + (have ‘int16x8_t’ and ‘int8x16_t’) 232 | vec_c[0] += vec_v_right_1.val[0];

Apr 18 '25 03:04 theone10zero

Okay, these look like C++ compilation errors related to incompatible data types used with ARM NEON intrinsics (special functions for accelerating computation on ARM CPUs). The errors occur in one of the header files used by the project. Let's break down each error and how to fix it: Error 1: BitNet/3rdparty/llama.cpp/ggml/src/../../../../include/bitnet-lut-kernels.h:190:43: error: cannot convert ‘int16x8_t’ to ‘const int8x16_t’ in initialization 190 | const int8x16_t vec_zero = vdupq_n_s16(0x0000); | ~~~~~~~~~~~^~~~~~~~ | | | int16x8_t

Problem: The code tries to declare a variable vec_zero which is a NEON vector of sixteen 8-bit signed integers (int8x16_t). However, it tries to initialize it using the function vdupq_n_s16(0), which creates a NEON vector by duplicating a 16-bit signed integer, resulting in a different type (int16x8_t - a vector of eight 16-bit signed integers). The compiler cannot convert between these two different vector types.
Solution: You need to use the correct NEON intrinsic to create a vector of zeros with the type int8x16_t. That function is vdupq_n_s8(). Error 2: BitNet/3rdparty/llama.cpp/ggml/src/../../../../include/bitnet-lut-kernels.h:232:22: error: invalid operands to binary + (have ‘int16x8_t’ and ‘int8x16_t’) 232 | vec_c[0] += vec_v_right_1.val[0]; | ~~~~~~~~~~^~~~~~~~~~~~~~~
Problem: The code attempts to perform an addition (+=) between two variables of incompatible NEON vector types. The error message tells us one operand is int16x8_t (likely vec_v_right_1.val[0]) and the other is int8x16_t (likely vec_c[0]). You cannot directly add vectors with different element sizes and lengths.
Solution: This is harder to fix definitively without seeing more surrounding code. It likely requires either:
- Changing the declared type of vec_c to match vec_v_right_1.val[0] (e.g., to int16x8_t) if it's intended as an accumulator for 16-bit values.
- Or, converting/casting vec_v_right_1.val[0] to the correct type (int8x16_t) before the addition (though this might involve losing data or require more complex NEON instructions like vmovn_s16 or vqmovn_s16).
- The GitHub issue found in the search results confirms others have hit these errors, suggesting it might be a bug in the code itself. Step-by-Step Fix (Focusing on Error 1 first): Let's fix the first error, which is unambiguous, and then see if the second error persists or changes.
Identify the File: The error path points deep into the build structure (BitNet/3rdparty/llama.cpp/ggml/src/../../../../include/bitnet-lut-kernels.h). Based on our previous edits, the actual source file you need to modify is likely ~/BitNet/include/ggml-bitnet.h. Let's assume this is the correct file.
Edit the File: nano ~/BitNet/include/ggml-bitnet.h
Fix Line 190:
- Use Ctrl + _ then type 190 and press Enter to jump to line 190.
- You should see the line: const int8x16_t vec_zero = vdupq_n_s16(0x0000);
- Change vdupq_n_s16(0x0000) to vdupq_n_s8(0): const int8x16_t vec_zero = vdupq_n_s8(0);
Save and Exit:
- Press Ctrl + O, then Enter to save.
- Press Ctrl + X to exit.
Clean Build Directory: It's best to remove the old build artifacts to ensure a clean compile. cd ~/BitNet/build rm -rf *
Retry Build: Re-run CMake and Make.

Still in ~/BitNet/build

cmake .. make -j$(nproc)

Report Outcome: Please run these steps. Let me know if the build now completes successfully, or if you still get the second error (on line 232), or if a different error appears. If the error on line 232 still occurs, we might need to make an educated guess or look more closely at the code around that line.

Apr 20 '25 05:04 adityaparsad01

Think I have the same.

Attach full build log.

I try compile under WSL with CPU: Snapdragon X Elite

uname -a Linux note 5.15.90.1-microsoft-standard-WSL2 #1 SMP Thu Jan 9 21:04:42 +06 2025 aarch64 aarch64 aarch64 GNU/Linux

compile.log

Apr 20 '25 17:04 cm4ker

May be related, see my hack: https://github.com/microsoft/BitNet/issues/192#issuecomment-2818299330

Apr 21 '25 12:04 Manamama

Also occurs on Ampere Altra Q64-30 (Arm ISA 8.2) and AmpereOne (Arm ISA 8.6) A192-32X CPUs. Linux altrad8u 6.8.0-58-generic-64k #60-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 14 19:41:44 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux compile.log

Apr 23 '25 14:04 vikingforties

Also occurs on Ampere Altra Q64-30 (Arm ISA 8.2) and AmpereOne (Arm ISA 8.6) A192-32X CPUs. Linux altrad8u 6.8.0-58-generic-64k #60-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 14 19:41:44 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux compile.log

Have you found a solution to it?

Apr 25 '25 07:04 jasonsi1993

Not so far but we are looking into it.

Apr 25 '25 09:04 vikingforties

我是用armbian 编译该模型也是出现了， error: cannot convert ‘int16x8_t’ to ‘const int8x16_t’ in initialization 我是用的是S905L3A CPU，有没有解决的办法

Apr 27 '25 06:04 dsruanjian1

FYI, an update on my Droid box, in both Termux and Prooted Debian:

llava-cli has always been compiling fine, and working, for months.
python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s hangs up for +12 hours.
regular cmake . in the same BitNet git cloned folder works, but as per my deja vus, the python .py itself refuses then to work. Refs:

root@localhost:~/downloads_Termux/BitNet# ls
 3rdparty             LICENSE            run_inference.py
 assets               logs               SECURITY.md
 build                media              setup_env.py
 CMakeLists.txt       models             src
 CODE_OF_CONDUCT.md   preset_kernels     utils
 docs                 README.md
 include              requirements.txt
root@localhost:~/downloads_Termux/BitNet# cmake .
-- The C compiler identification is Clang 14.0.6
-- The CXX compiler identification is Clang 14.0.6
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found Git: /usr/bin/git (found version "2.39.5")
-- Found OpenMP_C: -fopenmp=libomp (found version "5.0")
-- Found OpenMP_CXX: -fopenmp=libomp (found version "5.0")
-- Found OpenMP: TRUE (found version "5.0")
-- OpenMP found
-- Using llamafile
-- ccache found, compilation results will be cached. Disable with GGML_CCACHE=OFF.
-- CMAKE_SYSTEM_PROCESSOR: aarch64
-- ARM detected
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
-- Configuring done
-- Generating done
-- Build files have been written to: /root/downloads_Termux/BitNet
root@localhost:~/downloads_Termux/BitNet# make
[  1%] Building C object 3rdparty/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml.c.o
In file included from /root/downloads_Termux/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:7:
/root/downloads_Termux/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-quants.h:153:7: warning: no newline at end of file [-Wnewline-eof]
#endif
      ^
/root/downloads_Termux/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:2313:5: warning: implicit conversion increases floating-point precision: 'float32_t' (aka 'float') to 'ggml_float' (aka 'double') [-Wdouble-promotion]
    GGML_F16_VEC_REDUCE(sumf, sum);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/root/downloads_Termux/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:1322:41: note: expanded from macro 'GGML_F16_VEC_REDUCE'
    #define GGML_F16_VEC_REDUCE         GGML_F32Cx4_REDUCE
                                        ^
/root/downloads_Termux/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:1312:38: note: expanded from macro 'GGML_F32Cx4_REDUCE'
    #define GGML_F32Cx4_REDUCE       GGML_F32x4_REDUCE
                                     ^
/root/downloads_Termux/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:1242:13: note: expanded from macro 'GGML_F32x4_REDUCE'
    (res) = GGML_F32x4_REDUCE_ONE((x)[0]);         \
          ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/root/downloads_Termux/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:1227:34: note: expanded from macro 'GGML_F32x4_REDUCE_ONE'
#define GGML_F32x4_REDUCE_ONE(x) vaddvq_f32(x)
                                 ^~~~~~~~~~~~~
/root/downloads_Termux/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:2361:9: warning: implicit conversion increases floating-point precision: 'float32_t' (aka 'float') to 'ggml_float' (aka 'double') [-Wdouble-promotion]
        GGML_F16_VEC_REDUCE(sumf[k], sum[k]);
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/root/downloads_Termux/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:1322:41: note: expanded from macro 'GGML_F16_VEC_REDUCE'
    #define GGML_F16_VEC_REDUCE         GGML_F32Cx4_REDUCE
                                        ^

...[100%] Linking CXX executable ../../../../bin/llama-tokenize
[100%] Built target llama-tokenize
root@localhost:~/downloads_Termux/BitNet# 0# Run inference with the quantized model
python run_inference.py -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf -p "You are a helpful assistant" -cnv
bash: 0#: command not found                                   Traceback (most recent call last):                              File "/data/data/com.termux/files/home/downloads/BitNet/run_inference.py", line 56, in <module>                               run_inference()                                             File "/data/data/com.termux/files/home/downloads/BitNet/run_inference.py", line 37, in run_inference
    run_command(command)
  File "/data/data/com.termux/files/home/downloads/BitNet/run_inference.py", line 11, in run_command
    subprocess.run(command, shell=shell, check=True)
  File "/usr/lib/python3.11/subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^                            File "/usr/lib/python3.11/subprocess.py", line 1024, in __init__                                                              self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.11/subprocess.py", line 1901, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'build/bin/llama-cli'

Reminder: hand cloning the llama-cli there and hand compiling it results in that llama-cli there, but then it is all incompatible with BitNet format anyway.

Ref: uname -a Linux localhost 6.2.1-PRoot-Distro #1 SMP PREEMPT Thu Mar 17 16:28:22 CST 2022 aarch64 GNU/Linux, but identical symptoms on base Termux.

Apr 27 '25 16:04 Manamama

it compiled for me. Debian(proot-distro) with venv clang-14.x.x (installed by apt install clang) export CC=clang export CXX=clang++ I tried b4 with clang-19 but didnt work for me, so used the older one

Apr 28 '25 03:04 TanishkBansode

Update from me: python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s hangs up for +70 hours: both in proot and in Termux. Symptom: cmake takes about 10 percent of one core out of 8, all the time. I have given up. Reminder:

root@localhost:~/downloads_Termux/BitNet# clang --version
Debian clang version 14.0.6
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
root@localhost:~/downloads_Termux/BitNet# uname -a
Linux localhost 6.2.1-PRoot-Distro #1 SMP PREEMPT Thu Mar 17 16:28:22 CST 2022 aarch64 GNU/Linux
root@localhost:~/downloads_Termux/BitNet#

Apr 30 '25 15:04 Manamama

BitNet/3rdparty/llama.cpp/ggml/src/../../../../include/bitnet-lut-kernels.h:190:43: error: cannot convert ‘int16x8_t’ to ‘const int8x16_t’ in initialization 190 | const int8x16_t vec_zero = vdupq_n_s16(0x0000); | ~~~~~~~~~~~^~~~~~~~ | | | int16x8_t

BitNet/3rdparty/llama.cpp/ggml/src/../../../../include/bitnet-lut-kernels.h:232:22: error: invalid operands to binary + (have ‘int16x8_t’ and ‘int8x16_t’) 232 | vec_c[0] += vec_v_right_1.val[0];

I also had the same problem. So I changed the code of line 190 to

const int16x8_t vec_zero = vdupq_n_s16(0x0000);

and line 217(+others..) to

vec_c[0] += vreinterpretq_s16_s8(vec_v_left_0.val[0]);

After that, compilation works fine.

May 03 '25 15:05 mgk757

BitNet BitNet copied to clipboard

arm device compile error.

Still in ~/BitNet/build

BitNet
BitNet copied to clipboard