Outputs are some repeated words like "cluster"/"mass" or characters like "!" and "G" both in Windows10 and WSL(Ubuntu) with conda environment

Open AidenYang-Github opened this issue 7 months ago • 1 comments

Intel(R) Core(TM) i3-3217U CPU @ 1.80GHz

=== System: Windows10(PowerShell--conda) === Python==3.9.22

cmake --version
cmake version 3.29.5-msvc4

clang --version
clang version 18.1.8
Target: i686-pc-windows-msvc
Thread model: posix
InstalledDir: D:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\Llvm\bin

When I use python .\run_inference.py -m .\BitNet-b1.58-2B-4T\ggml-model-i2_s.gguf -p "You are a helpful assistant" -cnv, got the following outputs:

...............................
llama_new_context_with_model: n_batch is less than GGML_KQ_MASK_PAD - increasing to 32
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 32
llama_new_context_with_model: n_ubatch   = 32
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =   150.00 MiB
llama_new_context_with_model: KV self size  =  150.00 MiB, K (f16):   75.00 MiB, V (f16):   75.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.49 MiB
llama_new_context_with_model:        CPU compute buffer size =    15.97 MiB
llama_new_context_with_model: graph nodes  = 1116
llama_new_context_with_model: graph splits = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 2
main: chat template example:
System: You are a helpful assistantUser: Hello<|eot_id|>Assistant: Hi thereUser: How are you?<|eot_id|>Assistant:

system_info: n_threads = 2 (n_threads_batch = 2) / 4 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |

main: interactive mode on.
sampler seed: 1165020431
sampler params:
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> top-k -> tail-free -> typical -> top-p -> min-p -> temp-ext -> softmax -> dist
generate: n_ctx = 2048, n_batch = 1, n_predict = 128, n_keep = 1

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to the AI.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

System: You are a helpful assistant
> Hello
 mam cluster mass mass mass mass mass mass cluster cluster cluster cluster cluster cluster cluster mass cluster cluster cluster cluster leh cluster cluster mass cluster area cluster cluster pivot cluster trend pivot cluster cluster cluster mass cluster mass cluster mam mass cluster mass cluster mass mass cluster cluster cluster cluster mass cluster cluster cluster cluster cluster cluster cluster cluster cluster leh mass cluster cluster cluster mam leh cluster area cluster cluster cluster cluster cluster trend cluster cluster cluster cluster cluster cluster mass cluster mass area mass cluster mass cluster cluster mam area mass cluster cluster mass cluster cluster mass pivot trend cluster cluster cluster cluster cluster mass cluster trend cluster cluster cluster cluster cluster cluster cluster cluster cluster area cluster
>
llama_perf_sampler_print:    sampling time =      35.06 ms /   128 runs   (    0.27 ms per token,  3650.68 tokens per second)
llama_perf_context_print:        load time =   15358.15 ms
llama_perf_context_print: prompt eval time =   13490.70 ms /    16 tokens (  843.17 ms per token,     1.19 tokens per second)
llama_perf_context_print:        eval time =   57412.87 ms /   119 runs   (  482.46 ms per token,     2.07 tokens per second)
llama_perf_context_print:       total time = 1010511.00 ms /   135 tokens
Interrupted by user
Ctrl+C pressed, exiting...

Windows_m_p_cnv.txt

When I use python .\run_inference.py -m .\BitNet-b1.58-2B-4T\ggml-model-i2_s.gguf -p "You are a helpful assistant" -temp 0 -cnv, got the following outputs:

...............................
llama_new_context_with_model: n_batch is less than GGML_KQ_MASK_PAD - increasing to 32
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 32
llama_new_context_with_model: n_ubatch   = 32
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =   150.00 MiB
llama_new_context_with_model: KV self size  =  150.00 MiB, K (f16):   75.00 MiB, V (f16):   75.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.49 MiB
llama_new_context_with_model:        CPU compute buffer size =    15.97 MiB
llama_new_context_with_model: graph nodes  = 1116
llama_new_context_with_model: graph splits = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 2
main: chat template example:
System: You are a helpful assistantUser: Hello<|eot_id|>Assistant: Hi thereUser: How are you?<|eot_id|>Assistant:

system_info: n_threads = 2 (n_threads_batch = 2) / 4 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |

main: interactive mode on.
sampler seed: 4294967295
sampler params:
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.000
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> greedy
generate: n_ctx = 2048, n_batch = 1, n_predict = 128, n_keep = 1

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to the AI.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

System: You are a helpful assistant
> Hello
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>
llama_perf_sampler_print:    sampling time =      30.26 ms /   128 runs   (    0.24 ms per token,  4229.45 tokens per second)
llama_perf_context_print:        load time =    2969.97 ms
llama_perf_context_print: prompt eval time =   10672.81 ms /    16 tokens (  667.05 ms per token,     1.50 tokens per second)
llama_perf_context_print:        eval time =   58496.80 ms /   119 runs   (  491.57 ms per token,     2.03 tokens per second)
llama_perf_context_print:       total time =  104886.30 ms /   135 tokens
Interrupted by user
Ctrl+C pressed, exiting...

Windows_m_p_temp0_cnv.txt

=== System: WSL2(Ubuntu--conda) === Python==3.9.21

cmake --version
cmake version 3.22.1

clang --version
Ubuntu clang version 19.1.7 (++20250114103320+cd708029e0b2-1~exp1~20250114103432.75)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm-19/bin

When I use python run_inference.py -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf -p "You are a helpful assistant" -cnv, got the following outputs:

...............................
llama_new_context_with_model: n_batch is less than GGML_KQ_MASK_PAD - increasing to 32
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 32
llama_new_context_with_model: n_ubatch   = 32
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =   150.00 MiB
llama_new_context_with_model: KV self size  =  150.00 MiB, K (f16):   75.00 MiB, V (f16):   75.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.49 MiB
llama_new_context_with_model:        CPU compute buffer size =    15.97 MiB
llama_new_context_with_model: graph nodes  = 1116
llama_new_context_with_model: graph splits = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 2
main: chat template example:
System: You are a helpful assistantUser: Hello<|eot_id|>Assistant: Hi thereUser: How are you?<|eot_id|>Assistant:

system_info: n_threads = 2 (n_threads_batch = 2) / 4 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |

main: interactive mode on.
sampler seed: 204485193
sampler params:
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> top-k -> tail-free -> typical -> top-p -> min-p -> temp-ext -> softmax -> dist
generate: n_ctx = 2048, n_batch = 1, n_predict = 128, n_keep = 1

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to the AI.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

System: You are a helpful assistant
> Hello
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
>
Ctrl+C pressed, exiting...
llama_perf_sampler_print:    sampling time =      33.06 ms /   128 runs   (    0.26 ms per token,  3871.63 tokens per second)
llama_perf_context_print:        load time =    5192.28 ms
llama_perf_context_print: prompt eval time =    4681.59 ms /    16 tokens (  292.60 ms per token,     3.42 tokens per second)
llama_perf_context_print:        eval time =   11151.59 ms /   119 runs   (   93.71 ms per token,    10.67 tokens per second)
llama_perf_context_print:       total time =   23696.05 ms /   135 tokens
Interrupted by user

WSL_m_p_cnv.txt

When I use python run_inference.py -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf -p "You are a helpful assistant" -temp 0 -cnv, got the following outputs same as in PowerShell:

...............................
llama_new_context_with_model: n_batch is less than GGML_KQ_MASK_PAD - increasing to 32
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 32
llama_new_context_with_model: n_ubatch   = 32
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =   150.00 MiB
llama_new_context_with_model: KV self size  =  150.00 MiB, K (f16):   75.00 MiB, V (f16):   75.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.49 MiB
llama_new_context_with_model:        CPU compute buffer size =    15.97 MiB
llama_new_context_with_model: graph nodes  = 1116
llama_new_context_with_model: graph splits = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 2
main: chat template example:
System: You are a helpful assistantUser: Hello<|eot_id|>Assistant: Hi thereUser: How are you?<|eot_id|>Assistant:

system_info: n_threads = 2 (n_threads_batch = 2) / 4 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |

main: interactive mode on.
sampler seed: 4294967295
sampler params:
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.000
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> greedy
generate: n_ctx = 2048, n_batch = 1, n_predict = 128, n_keep = 1

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to the AI.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

System: You are a helpful assistant
> Hello
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> Ctrl+C pressed, exiting...

llama_perf_sampler_print:    sampling time =      36.30 ms /   128 runs   (    0.28 ms per token,  3526.56 tokens per second)
llama_perf_context_print:        load time =    1742.86 ms
llama_perf_context_print: prompt eval time =    4305.54 ms /    16 tokens (  269.10 ms per token,     3.72 tokens per second)
llama_perf_context_print:        eval time =   12661.84 ms /   119 runs   (  106.40 ms per token,     9.40 tokens per second)
llama_perf_context_print:       total time =   19272.35 ms /   135 tokens
Interrupted by user

WSL_m_p_temp0_cnv.txt

I'm not sure whether this problem is caused by the configuration parameters of building project, when I use make -j4 to build the project, some warnings happens like this:

[  1%] Building C object 3rdparty/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml.c.o
[  2%] Building C object 3rdparty/llama.cpp/examples/gguf-hash/CMakeFiles/xxhash.dir/deps/xxhash/xxhash.c.o
[  3%] Building CXX object 3rdparty/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o
[  4%] Building C object 3rdparty/llama.cpp/examples/gguf-hash/CMakeFiles/sha256.dir/deps/sha256/sha256.c.o
[  4%] Built target build_info
[  5%] Building C object 3rdparty/llama.cpp/examples/gguf-hash/CMakeFiles/sha1.dir/deps/sha1/sha1.c.o
[  5%] Built target sha256
[  5%] Building C object 3rdparty/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-alloc.c.o
[  5%] Built target sha1
[  6%] Building CXX object 3rdparty/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-backend.cpp.o
[  7%] Building C object 3rdparty/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-quants.c.o
In file included from /home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:7:
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/ggml-quants.h:153:7: warning: no newline at end of file [-Wnewline-eof]
  153 | #endif
      |       ^
In file included from /home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/ggml-quants.c:4:
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/ggml-quants.h:153:7: warning: no newline at end of file [-Wnewline-eof]
  153 | #endif
      |       ^
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:12514:6: warning: no previous prototype for function 'float_act_quant' [-Wmissing-prototypes]
 12514 | void float_act_quant(const int K, float* B, int32_t* dst, float* act_scale) {
       |      ^
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:12514:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
 12514 | void float_act_quant(const int K, float* B, int32_t* dst, float* act_scale) {
       | ^
       | static
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:12534:18: warning: implicit conversion increases floating-point precision: 'float' to 'double' [-Wdouble-promotion]
 12534 |         if (fabs(A[i]) > max){
       |             ~~~~ ^~~~
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:12535:32: warning: implicit conversion increases floating-point precision: 'float' to 'double' [-Wdouble-promotion]
 12535 |             i2_scale[0] = fabs(A[i]);
       |                           ~~~~ ^~~~
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:12545:37: warning: implicit conversion increases floating-point precision: 'float' to 'double' [-Wdouble-promotion]
 12545 |             dst[i] = (double)A[i] * i2_scale[0] > 0 ? 1 : -1;
       |                                   ~ ^~~~~~~~~~~
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:12530:6: warning: no previous prototype for function 'weight_quant_f32' [-Wmissing-prototypes]
 12530 | void weight_quant_f32(const int M, const int K, float* A, int32_t* dst, float* i2_scale) {
       |      ^
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:12530:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
 12530 | void weight_quant_f32(const int M, const int K, float* A, int32_t* dst, float* i2_scale) {
       | ^
       | static
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:12554:18: warning: implicit conversion increases floating-point precision: 'float' to 'double' [-Wdouble-promotion]
 12554 |         if (fabs(temp_A) > max){
       |             ~~~~ ^~~~~~
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:12555:32: warning: implicit conversion increases floating-point precision: 'float' to 'double' [-Wdouble-promotion]
 12555 |             i2_scale[0] = fabs(temp_A);
       |                           ~~~~ ^~~~~~
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:12566:39: warning: implicit conversion increases floating-point precision: 'float' to 'double' [-Wdouble-promotion]
 12566 |             dst[i] = (double)temp_A * i2_scale[0] > 0 ? 1 : -1;
       |                                     ~ ^~~~~~~~~~~
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:12550:6: warning: no previous prototype for function 'weight_quant_f16' [-Wmissing-prototypes]
 12550 | void weight_quant_f16(const int M, const int K, uint16_t* A, int32_t* dst, float* i2_scale) {
       |      ^
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:12550:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
 12550 | void weight_quant_f16(const int M, const int K, uint16_t* A, int32_t* dst, float* i2_scale) {
       | ^
       | static
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:12571:6: warning: no previous prototype for function 'matrixMultiply_int' [-Wmissing-prototypes]
 12571 | void matrixMultiply_int(const int M, const int N, const int K, const int32_t* A, const int32_t* B, int32_t* C) {
       |      ^
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:12571:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
 12571 | void matrixMultiply_int(const int M, const int N, const int K, const int32_t* A, const int32_t* B, int32_t* C) {
       | ^
       | static
[  7%] Built target xxhash
[  8%] Building CXX object 3rdparty/llama.cpp/ggml/src/CMakeFiles/ggml.dir/__/__/__/__/src/ggml-bitnet-mad.cpp.o
In file included from /home/jaysk/BitNet/src/ggml-bitnet-mad.cpp:5:
In file included from /home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-quants.h:4:
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:154:9: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct]
  154 |         struct {
      |         ^
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:154:9: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:175:9: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct]
  175 |         struct {
      |         ^
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:175:9: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:196:9: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct]
  196 |         struct {
      |         ^
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:196:9: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:261:9: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct]
  261 |         struct {
      |         ^
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:261:9: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:294:9: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct]
  294 |         struct {
      |         ^
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:294:9: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:311:9: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct]
  311 |         struct {
      |         ^
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:311:9: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
[  9%] Building CXX object 3rdparty/llama.cpp/ggml/src/CMakeFiles/ggml.dir/__/__/__/__/src/ggml-bitnet-lut.cpp.o
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/ggml-quants.c:16121:2: warning: no newline at end of file [-Wnewline-eof]
 16121 | }
       |  ^
In file included from /home/jaysk/BitNet/src/ggml-bitnet-lut.cpp:9:
In file included from /home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-quants.h:4:
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:154:9: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct]
  154 |         struct {
      |         ^
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:154:9: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:175:9: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct]
  175 |         struct {
      |         ^
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:175:9: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:196:9: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct]
  196 |         struct {
      |         ^
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:196:9: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:261:9: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct]
  261 |         struct {
      |         ^
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:261:9: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:294:9: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct]
  294 |         struct {
      |         ^
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:294:9: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:311:9: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct]
  311 |         struct {
      |         ^
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/./ggml-common.h:311:9: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
12 warnings generated.
[ 10%] Building CXX object 3rdparty/llama.cpp/ggml/src/CMakeFiles/ggml.dir/llamafile/sgemm.cpp.o
/home/jaysk/BitNet/src/ggml-bitnet-mad.cpp:46:100: warning: unused parameter 'quant_weights' [-Wunused-parameter]
   46 | size_t quantize_i2_s(const float * src, void * dst, int64_t nrow, int64_t n_per_row, const float * quant_weights) {
      |                                                                                                    ^
/home/jaysk/BitNet/src/ggml-bitnet-mad.cpp:95:39: warning: cast from 'const void *' to 'unsigned char *' drops const qualifier [-Wcast-qual]
   95 |     const uint8_t *    x = (uint8_t *)vx;
      |                                       ^
/home/jaysk/BitNet/src/ggml-bitnet-mad.cpp:96:38: warning: cast from 'const void *' to 'signed char *' drops const qualifier [-Wcast-qual]
   96 |     const int8_t  *    y = (int8_t *)vy;
      |                                      ^
/home/jaysk/BitNet/src/ggml-bitnet-mad.cpp:95:24: warning: unused variable 'x' [-Wunused-variable]
   95 |     const uint8_t *    x = (uint8_t *)vx;
      |                        ^
/home/jaysk/BitNet/src/ggml-bitnet-mad.cpp:96:24: warning: unused variable 'y' [-Wunused-variable]
   96 |     const int8_t  *    y = (int8_t *)vy;
      |                        ^
/home/jaysk/BitNet/src/ggml-bitnet-mad.cpp:99:15: warning: unused variable 'group32_num' [-Wunused-variable]
   99 |     const int group32_num = nb / 32;
      |               ^~~~~~~~~~~
/home/jaysk/BitNet/src/ggml-bitnet-mad.cpp:100:15: warning: unused variable 'la_num' [-Wunused-variable]
  100 |     const int la_num = nb % 32;
      |               ^~~~~~
/home/jaysk/BitNet/src/ggml-bitnet-mad.cpp:101:15: warning: unused variable 'groupla_num' [-Wunused-variable]
  101 |     const int groupla_num = nb % 32 != 0 ? 1 : 0;
      |               ^~~~~~~~~~~
/home/jaysk/BitNet/src/ggml-bitnet-mad.cpp:94:42: warning: unused parameter 's' [-Wunused-parameter]
   94 | void ggml_vec_dot_i2_i8_s(int n, float * s, size_t bs, const void * vx, size_t bx, const void * vy, size_t by, int nrc) {
      |                                          ^
/home/jaysk/BitNet/src/ggml-bitnet-mad.cpp:94:52: warning: unused parameter 'bs' [-Wunused-parameter]
   94 | void ggml_vec_dot_i2_i8_s(int n, float * s, size_t bs, const void * vx, size_t bx, const void * vy, size_t by, int nrc) {
      |                                                    ^
/home/jaysk/BitNet/src/ggml-bitnet-mad.cpp:94:80: warning: unused parameter 'bx' [-Wunused-parameter]
   94 | void ggml_vec_dot_i2_i8_s(int n, float * s, size_t bs, const void * vx, size_t bx, const void * vy, size_t by, int nrc) {
      |                                                                                ^
/home/jaysk/BitNet/src/ggml-bitnet-mad.cpp:94:108: warning: unused parameter 'by' [-Wunused-parameter]
   94 | void ggml_vec_dot_i2_i8_s(int n, float * s, size_t bs, const void * vx, size_t bx, const void * vy, size_t by, int nrc) {
      |                                                                                                            ^
/home/jaysk/BitNet/src/ggml-bitnet-mad.cpp:94:116: warning: unused parameter 'nrc' [-Wunused-parameter]
   94 | void ggml_vec_dot_i2_i8_s(int n, float * s, size_t bs, const void * vx, size_t bx, const void * vy, size_t by, int nrc) {
      |                                                                                                                    ^
25 warnings generated.
[ 10%] Building C object 3rdparty/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-aarch64.c.o
In file included from /home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/ggml-aarch64.c:8:
/home/jaysk/BitNet/3rdparty/llama.cpp/ggml/src/ggml-quants.h:153:7: warning: no newline at end of file [-Wnewline-eof]
  153 | #endif
      |       ^
1 warning generated.
2 warnings generated.
11 warnings generated.
[ 11%] Linking CXX shared library libggml.so
[ 11%] Built target ggml
[ 12%] Building CXX object 3rdparty/llama.cpp/src/CMakeFiles/llama.dir/llama-vocab.cpp.o
[ 13%] Building CXX object 3rdparty/llama.cpp/src/CMakeFiles/llama.dir/llama.cpp.o
[ 14%] Building CXX object 3rdparty/llama.cpp/examples/gguf-hash/CMakeFiles/llama-gguf-hash.dir/gguf-hash.cpp.o
[ 15%] Building CXX object 3rdparty/llama.cpp/examples/gguf/CMakeFiles/llama-gguf.dir/gguf.cpp.o
[ 16%] Linking CXX executable ../../../../bin/llama-gguf
[ 16%] Built target llama-gguf
[ 17%] Building CXX object 3rdparty/llama.cpp/src/CMakeFiles/llama.dir/llama-grammar.cpp.o
[ 18%] Linking CXX executable ../../../../bin/llama-gguf-hash
[ 18%] Built target llama-gguf-hash
[ 19%] Building CXX object 3rdparty/llama.cpp/src/CMakeFiles/llama.dir/llama-sampling.cpp.o
[ 19%] Building CXX object 3rdparty/llama.cpp/src/CMakeFiles/llama.dir/unicode.cpp.o
[ 20%] Building CXX object 3rdparty/llama.cpp/src/CMakeFiles/llama.dir/unicode-data.cpp.o
[ 21%] Linking CXX shared library libllama.so
[ 21%] Built target llama
[ 22%] Building CXX object 3rdparty/llama.cpp/common/CMakeFiles/common.dir/arg.cpp.o
[ 23%] Building CXX object 3rdparty/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o
[ 24%] Building CXX object 3rdparty/llama.cpp/examples/quantize-stats/CMakeFiles/llama-quantize-stats.dir/quantize-stats.cpp.o
[ 25%] Building CXX object 3rdparty/llama.cpp/examples/simple/CMakeFiles/llama-simple.dir/simple.cpp.o
[ 25%] Linking CXX executable ../../../../bin/llama-simple
[ 25%] Built target llama-simple
[ 25%] Building CXX object 3rdparty/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o
[ 26%] Building CXX object 3rdparty/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o
[ 26%] Linking CXX executable ../../../../bin/llama-quantize-stats
[ 26%] Built target llama-quantize-stats
[ 27%] Building CXX object 3rdparty/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o
[ 28%] Building CXX object 3rdparty/llama.cpp/common/CMakeFiles/common.dir/json-schema-to-grammar.cpp.o
[ 28%] Building CXX object 3rdparty/llama.cpp/common/CMakeFiles/common.dir/log.cpp.o
[ 29%] Building CXX object 3rdparty/llama.cpp/common/CMakeFiles/common.dir/ngram-cache.cpp.o
[ 29%] Built target llava
[ 30%] Linking CXX static library libllava_static.a
[ 30%] Built target llava_static
[ 31%] Linking CXX shared library libllava_shared.so
[ 31%] Built target llava_shared
[ 32%] Building CXX object 3rdparty/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o
[ 33%] Building CXX object 3rdparty/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o
[ 34%] Linking CXX static library libcommon.a
[ 34%] Built target common
[ 35%] Building CXX object 3rdparty/llama.cpp/examples/batched-bench/CMakeFiles/llama-batched-bench.dir/batched-bench.cpp.o
[ 37%] Building CXX object 3rdparty/llama.cpp/examples/baby-llama/CMakeFiles/llama-baby-llama.dir/baby-llama.cpp.o
[ 37%] Building CXX object 3rdparty/llama.cpp/examples/cvector-generator/CMakeFiles/llama-cvector-generator.dir/cvector-generator.cpp.o
[ 38%] Building CXX object 3rdparty/llama.cpp/examples/batched/CMakeFiles/llama-batched.dir/batched.cpp.o
[ 39%] Linking CXX executable ../../../../bin/llama-baby-llama
[ 40%] Linking CXX executable ../../../../bin/llama-batched-bench
[ 40%] Built target llama-baby-llama
[ 41%] Building CXX object 3rdparty/llama.cpp/examples/convert-llama2c-to-ggml/CMakeFiles/llama-convert-llama2c-to-ggml.dir/convert-llama2c-to-ggml.cpp.o
[ 41%] Linking CXX executable ../../../../bin/llama-batched
[ 41%] Built target llama-batched-bench
[ 42%] Building CXX object 3rdparty/llama.cpp/examples/embedding/CMakeFiles/llama-embedding.dir/embedding.cpp.o
[ 42%] Built target llama-batched
[ 43%] Building CXX object 3rdparty/llama.cpp/examples/eval-callback/CMakeFiles/llama-eval-callback.dir/eval-callback.cpp.o
[ 44%] Linking CXX executable ../../../../bin/llama-eval-callback
[ 45%] Linking CXX executable ../../../../bin/llama-cvector-generator
[ 45%] Built target llama-eval-callback
[ 46%] Building CXX object 3rdparty/llama.cpp/examples/export-lora/CMakeFiles/llama-export-lora.dir/export-lora.cpp.o
[ 46%] Linking CXX executable ../../../../bin/llama-embedding
[ 46%] Built target llama-cvector-generator
[ 47%] Building CXX object 3rdparty/llama.cpp/examples/gbnf-validator/CMakeFiles/llama-gbnf-validator.dir/gbnf-validator.cpp.o
[ 47%] Built target llama-embedding
[ 48%] Building CXX object 3rdparty/llama.cpp/examples/gguf-split/CMakeFiles/llama-gguf-split.dir/gguf-split.cpp.o
[ 49%] Linking CXX executable ../../../../bin/llama-convert-llama2c-to-ggml
[ 49%] Built target llama-convert-llama2c-to-ggml
[ 50%] Building CXX object 3rdparty/llama.cpp/examples/gritlm/CMakeFiles/llama-gritlm.dir/gritlm.cpp.o
[ 50%] Linking CXX executable ../../../../bin/llama-gbnf-validator
[ 50%] Built target llama-gbnf-validator
[ 51%] Building CXX object 3rdparty/llama.cpp/examples/imatrix/CMakeFiles/llama-imatrix.dir/imatrix.cpp.o
[ 51%] Linking CXX executable ../../../../bin/llama-gguf-split
[ 52%] Linking CXX executable ../../../../bin/llama-export-lora
[ 52%] Built target llama-gguf-split
[ 53%] Building CXX object 3rdparty/llama.cpp/examples/infill/CMakeFiles/llama-infill.dir/infill.cpp.o
[ 53%] Built target llama-export-lora
[ 54%] Building CXX object 3rdparty/llama.cpp/examples/llama-bench/CMakeFiles/llama-bench.dir/llama-bench.cpp.o
[ 55%] Linking CXX executable ../../../../bin/llama-gritlm
[ 55%] Built target llama-gritlm
[ 56%] Building CXX object 3rdparty/llama.cpp/examples/llava/CMakeFiles/llama-llava-cli.dir/llava-cli.cpp.o
[ 56%] Linking CXX executable ../../../../bin/llama-infill
[ 57%] Linking CXX executable ../../../../bin/llama-llava-cli
[ 57%] Built target llama-infill
[ 58%] Building CXX object 3rdparty/llama.cpp/examples/llava/CMakeFiles/llama-minicpmv-cli.dir/minicpmv-cli.cpp.o
[ 58%] Built target llama-llava-cli
[ 59%] Building CXX object 3rdparty/llama.cpp/examples/lookahead/CMakeFiles/llama-lookahead.dir/lookahead.cpp.o
[ 60%] Linking CXX executable ../../../../bin/llama-imatrix
[ 60%] Built target llama-imatrix
[ 61%] Building CXX object 3rdparty/llama.cpp/examples/lookup/CMakeFiles/llama-lookup.dir/lookup.cpp.o
[ 62%] Linking CXX executable ../../../../bin/llama-minicpmv-cli
[ 63%] Linking CXX executable ../../../../bin/llama-lookahead
[ 63%] Built target llama-minicpmv-cli
[ 64%] Building CXX object 3rdparty/llama.cpp/examples/lookup/CMakeFiles/llama-lookup-create.dir/lookup-create.cpp.o
[ 64%] Built target llama-lookahead
[ 65%] Building CXX object 3rdparty/llama.cpp/examples/lookup/CMakeFiles/llama-lookup-merge.dir/lookup-merge.cpp.o
[ 65%] Linking CXX executable ../../../../bin/llama-lookup
[ 65%] Built target llama-lookup
[ 66%] Building CXX object 3rdparty/llama.cpp/examples/lookup/CMakeFiles/llama-lookup-stats.dir/lookup-stats.cpp.o
[ 67%] Linking CXX executable ../../../../bin/llama-lookup-merge
[ 67%] Built target llama-lookup-merge
[ 68%] Building CXX object 3rdparty/llama.cpp/examples/main/CMakeFiles/llama-cli.dir/main.cpp.o
[ 69%] Linking CXX executable ../../../../bin/llama-lookup-create
[ 69%] Built target llama-lookup-create
[ 70%] Building CXX object 3rdparty/llama.cpp/examples/parallel/CMakeFiles/llama-parallel.dir/parallel.cpp.o
[ 70%] Linking CXX executable ../../../../bin/llama-lookup-stats
[ 70%] Built target llama-lookup-stats
[ 71%] Building CXX object 3rdparty/llama.cpp/examples/passkey/CMakeFiles/llama-passkey.dir/passkey.cpp.o
[ 72%] Linking CXX executable ../../../../bin/llama-parallel
[ 72%] Linking CXX executable ../../../../bin/llama-cli
[ 72%] Built target llama-parallel
[ 73%] Building CXX object 3rdparty/llama.cpp/examples/perplexity/CMakeFiles/llama-perplexity.dir/perplexity.cpp.o
[ 73%] Built target llama-cli
[ 74%] Building CXX object 3rdparty/llama.cpp/examples/quantize/CMakeFiles/llama-quantize.dir/quantize.cpp.o
[ 74%] Linking CXX executable ../../../../bin/llama-passkey
[ 74%] Built target llama-passkey
[ 75%] Building CXX object 3rdparty/llama.cpp/examples/retrieval/CMakeFiles/llama-retrieval.dir/retrieval.cpp.o
[ 76%] Linking CXX executable ../../../../bin/llama-quantize
[ 77%] Linking CXX executable ../../../../bin/llama-bench
[ 77%] Built target llama-quantize
[ 77%] Generating theme-snowstorm.css.hpp
[ 77%] Built target llama-bench
[ 78%] Generating colorthemes.css.hpp
[ 79%] Building CXX object 3rdparty/llama.cpp/examples/save-load-state/CMakeFiles/llama-save-load-state.dir/save-load-state.cpp.o
[ 80%] Generating completion.js.hpp
[ 81%] Generating index-new.html.hpp
[ 82%] Generating index.html.hpp
[ 83%] Generating index.js.hpp
[ 84%] Generating json-schema-to-grammar.mjs.hpp
[ 85%] Linking CXX executable ../../../../bin/llama-retrieval
[ 86%] Generating loading.html.hpp
[ 86%] Generating prompt-formats.js.hpp
[ 86%] Generating style.css.hpp
[ 87%] Generating system-prompts.js.hpp
[ 88%] Generating theme-beeninorder.css.hpp
[ 89%] Generating theme-ketivah.css.hpp
[ 89%] Built target llama-retrieval
[ 90%] Generating theme-mangotango.css.hpp
[ 91%] Building CXX object 3rdparty/llama.cpp/examples/speculative/CMakeFiles/llama-speculative.dir/speculative.cpp.o
[ 92%] Generating theme-playground.css.hpp
[ 93%] Generating theme-polarnight.css.hpp
[ 94%] Building CXX object 3rdparty/llama.cpp/examples/server/CMakeFiles/llama-server.dir/server.cpp.o
[ 95%] Linking CXX executable ../../../../bin/llama-save-load-state
[ 95%] Built target llama-save-load-state
[ 96%] Building CXX object 3rdparty/llama.cpp/examples/tokenize/CMakeFiles/llama-tokenize.dir/tokenize.cpp.o
[ 97%] Linking CXX executable ../../../../bin/llama-tokenize
[ 97%] Built target llama-tokenize
[ 98%] Linking CXX executable ../../../../bin/llama-perplexity
[ 98%] Built target llama-perplexity
[ 99%] Linking CXX executable ../../../../bin/llama-speculative
[ 99%] Built target llama-speculative
[100%] Linking CXX executable ../../../../bin/llama-server
[100%] Built target llama-server

The same issue I find in Model only outputs G repeatedly in interactive mode with ggml-model-i2_s.gguf #195. And I don't know whether the quant_type caused this problem, maybe I'll try another parameter like -q tl1.

Thanks for your help!

May 15 '25 08:05 AidenYang-Github

got same issue on i7-3635QM

Sep 07 '25 07:09 Crepveant