llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Misc. bug: since b4800 llama-cli does not prompt and llama-bench shows no results

Open pabpas opened this issue 6 months ago • 2 comments

Name and Version

Last working version:

$ llama-cli --version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Graphics (BMG G21) (Intel open-source Mesa driver) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 131072 | matrix cores: none
version: 4799 (14dec0c2)
built with cc (Debian 14.2.0-19) 14.2.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-cli

Command line


Problem description & steps to reproduce

Starting with b4800 llama-cli does not reach prompt input, it stops here:

$ llama-cli -m Ministral-8B-Instruct-2410.q8.gguf -ngl 37
[...]
main: interactive mode on.
sampler seed: 507615108
sampler params: 
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 4096
        top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
generate: n_ctx = 4096, n_batch = 2048, n_predict = -1, n_keep = 1

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to the AI.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

and llama-bench shows no results (also no error):

$ llama-bench -m llama-2-7b.Q4_0.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Graphics (BMG G21) (Intel open-source Mesa driver) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 131072 | matrix cores: none

First Bad Commit

https://github.com/ggml-org/llama.cpp/commit/cc473cac7cea1484c1f870231073b0bf0352c6f9

Relevant log output


pabpas avatar May 11 '25 11:05 pabpas

Do you have any files with special or non-ASCII characters in the directory?

slaren avatar May 11 '25 12:05 slaren

Your question reminded me of: https://github.com/ggml-org/llama.cpp/issues/11198 I had that one with debian bookworm but got fixed when upgrading to trixie.

Anyway, in the directory there were many files and could not see any non-ASCII characters, but to make sure I put the .gguf in a directory on its own. Unfortunately the outcome is the same.

pabpas avatar May 11 '25 20:05 pabpas

Please try to obtain a callstack of the crash:

  • Make a debug build by adding -DCMAKE_BUILD_TYPE=Debug to the cmake command line
  • Run gdb --ex run --ex bt --args llama-cli -m <rest of the command line>

slaren avatar May 12 '25 11:05 slaren

There is no crash, it just stays there and I am not able to input anything.

Built like this:

$ cmake -S . -B build -G Ninja -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Debug -DLLAMA_BUILD_TESTS=OFF -DLLAMA_BUILD_EXAMPLES=ON -DLLAMA_BUILD_SERVER=ON 
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- Including CPU backend
-- x86 detected
-- Adding CPU backend variant ggml-cpu: -march=native 
-- Vulkan found
-- GL_KHR_cooperative_matrix supported by glslc
-- GL_NV_cooperative_matrix2 supported by glslc
-- GL_EXT_integer_dot_product supported by glslc
-- GL_EXT_bfloat16 not supported by glslc
-- Including Vulkan backend
-- Configuring done (0.6s)
-- Generating done (0.1s)
-- Build files have been written to: /home/user/src/llama.cpp-vulkan/build

$ cmake --build build --config Release[2/164] Generating build details from Git
-- Found Git: /usr/bin/git (found version "2.47.2")
[35/164] Generate vulkan shaders
ggml_vulkan: Generating and compiling shaders to SPIR-V
[113/164] Building CXX object ggml/src/g...eFiles/ggml-vulkan.dir/ggml-vulkan.cpp.o
/home/user/src/llama.cpp-vulkan/ggml/src/ggml-vulkan/ggml-vulkan.cpp: In function ‘vk_pipeline ggml_vk_guess_matmul_pipeline(ggml_backend_vk_context*, vk_matmul_pipeline&, uint32_t, uint32_t, bool, ggml_type, ggml_type)’:
/home/user/src/llama.cpp-vulkan/ggml/src/ggml-vulkan/ggml-vulkan.cpp:4428:175: warning: unused parameter ‘src1_type’ [-Wunused-parameter]
 4428 | static vk_pipeline ggml_vk_guess_matmul_pipeline(ggml_backend_vk_context * ctx, vk_matmul_pipeline& mmp, uint32_t m, uint32_t n, bool aligned, ggml_type src0_type, ggml_type src1_type) {
      |                                                                                                                                                                     ~~~~~~~~~~^~~~~~~~~
[164/164] Linking CXX executable bin/llama-server

$ sudo cmake --install build --config Release

gdb output:

$ gdb --ex run --ex bt --args llama-cli -m Ministral-8B-Instruct-2410.q8.gguf -ngl 37
GNU gdb (Debian 16.3-1) 16.3
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from llama-cli...
Starting program: /usr/local/bin/llama-cli -m Ministral-8B-Instruct-2410.q8.gguf -ngl 37
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffe69ff6c0 (LWP 9042)]
[New Thread 0x7fffe60bd6c0 (LWP 9043)]
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Graphics (BMG G21) (Intel open-source Mesa driver) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 131072 | int dot: 1 | matrix cores: none
register_backend: registered backend Vulkan (1 devices)
register_device: registered device Vulkan0 (Intel(R) Graphics (BMG G21))
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (12th Gen Intel(R) Core(TM) i7-12700K)
[New Thread 0x7fffe881e6c0 (LWP 9044)]
build: 5345 (3eac2093) with cc (Debian 14.2.0-19) 14.2.0 for x86_64-linux-gnu (debug)
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device Vulkan0 (Intel(R) Graphics (BMG G21)) - 12216 MiB free
llama_model_loader: loaded meta data with 37 key-value pairs and 327 tensors from Ministral-8B-Instruct-2410.q8.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Ministral 8B Instruct 2410
llama_model_loader: - kv   3:                            general.version str              = 2410
llama_model_loader: - kv   4:                           general.finetune str              = Instruct
llama_model_loader: - kv   5:                           general.basename str              = Ministral
llama_model_loader: - kv   6:                         general.size_label str              = 8B
llama_model_loader: - kv   7:                            general.license str              = other
llama_model_loader: - kv   8:                       general.license.name str              = mrl
llama_model_loader: - kv   9:                       general.license.link str              = https://mistral.ai/licenses/MRL-0.1.md
llama_model_loader: - kv  10:                          general.languages arr[str,10]      = ["en", "fr", "de", "es", "it", "pt", ...
llama_model_loader: - kv  11:                          llama.block_count u32              = 36
llama_model_loader: - kv  12:                       llama.context_length u32              = 32768
llama_model_loader: - kv  13:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv  14:                  llama.feed_forward_length u32              = 12288
llama_model_loader: - kv  15:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv  16:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  17:                       llama.rope.freq_base f32              = 100000000.000000
llama_model_loader: - kv  18:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  19:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  20:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  21:                           llama.vocab_size u32              = 131072
llama_model_loader: - kv  22:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = tekken
llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,131072]  = ["<unk>", "<s>", "</s>", "[INST]", "[...
llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,131072]  = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,269443]  = ["Ġ Ġ", "Ġ t", "e r", "i n", "Ġ �...
llama_model_loader: - kv  28:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  29:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  30:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  32:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  33:                    tokenizer.chat_template str              = {%- if messages[0]["role"] == "system...
llama_model_loader: - kv  34:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  35:               general.quantization_version u32              = 2
llama_model_loader: - kv  36:                          general.file_type u32              = 7
llama_model_loader: - type  f32:   73 tensors
llama_model_loader: - type q8_0:  254 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q8_0
print_info: file size   = 7.94 GiB (8.50 BPW) 
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 1000
load: token to piece cache size = 0.8498 MB
print_info: arch             = llama
print_info: vocab_only       = 0
print_info: n_ctx_train      = 32768
print_info: n_embd           = 4096
print_info: n_layer          = 36
print_info: n_head           = 32
print_info: n_head_kv        = 8
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: n_swa_pattern    = 1
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = 4
print_info: n_embd_k_gqa     = 1024
print_info: n_embd_v_gqa     = 1024
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 12288
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 0
print_info: rope scaling     = linear
print_info: freq_base_train  = 100000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 32768
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 0
print_info: ssm_d_inner      = 0
print_info: ssm_d_state      = 0
print_info: ssm_dt_rank      = 0
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = 8B
print_info: model params     = 8.02 B
print_info: general.name     = Ministral 8B Instruct 2410
print_info: vocab type       = BPE
print_info: n_vocab          = 131072
print_info: n_merges         = 269443
print_info: BOS token        = 1 '<s>'
print_info: EOS token        = 2 '</s>'
print_info: UNK token        = 0 '<unk>'
print_info: LF token         = 1010 'Ċ'
print_info: EOG token        = 2 '</s>'
print_info: max token length = 150
load_tensors: loading model tensors, this can take a while... (mmap = true)
[New Thread 0x7fffe73fd6c0 (LWP 9051)]
load_tensors: offloading 36 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 37/37 layers to GPU
load_tensors:      Vulkan0 model buffer size =  7583.14 MiB
load_tensors:   CPU_Mapped model buffer size =   544.00 MiB
.........................................................................................
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 4096
llama_context: n_ctx_per_seq = 4096
llama_context: n_batch       = 2048
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = 0
llama_context: freq_base     = 100000000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_per_seq (4096) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
llama_context: Vulkan_Host  output buffer size =     0.50 MiB
llama_kv_cache_unified: kv_size = 4096, type_k = 'f16', type_v = 'f16', n_layer = 36, can_shift = 1, padding = 32
llama_kv_cache_unified:    Vulkan0 KV buffer size =   576.00 MiB
llama_kv_cache_unified: KV self size  =  576.00 MiB, K (f16):  288.00 MiB, V (f16):  288.00 MiB
llama_context:    Vulkan0 compute buffer size =   296.00 MiB
llama_context: Vulkan_Host compute buffer size =    16.01 MiB
llama_context: graph nodes  = 1230
llama_context: graph splits = 2
common_init_from_params: setting dry_penalty_last_n to ctx_size = 4096
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
[New Thread 0x7fffe7bff6c0 (LWP 9053)]
[New Thread 0x7fffe48b86c0 (LWP 9054)]
[New Thread 0x7fffd66fa6c0 (LWP 9055)]
[New Thread 0x7fffd5ef96c0 (LWP 9056)]
[New Thread 0x7fffd56f86c0 (LWP 9057)]
[New Thread 0x7fffd4ef76c0 (LWP 9058)]
[New Thread 0x7fff7f05c6c0 (LWP 9059)]
[New Thread 0x7fff7e85b6c0 (LWP 9060)]
[New Thread 0x7fff7e05a6c0 (LWP 9061)]
[New Thread 0x7fff7d8596c0 (LWP 9062)]
[Thread 0x7fffe7bff6c0 (LWP 9053) exited]
[New Thread 0x7fff7d0586c0 (LWP 9063)]
[New Thread 0x7fff7c8576c0 (LWP 9064)]
[New Thread 0x7fff4ffff6c0 (LWP 9065)]
[New Thread 0x7fff4f7fe6c0 (LWP 9066)]
[Thread 0x7fffe48b86c0 (LWP 9054) exited]
[Thread 0x7fff7d0586c0 (LWP 9063) exited]
[Thread 0x7fffd4ef76c0 (LWP 9058) exited]
[Thread 0x7fffd56f86c0 (LWP 9057) exited]
[New Thread 0x7fff4effd6c0 (LWP 9067)]
[Thread 0x7fff7e85b6c0 (LWP 9060) exited]
[Thread 0x7fff7f05c6c0 (LWP 9059) exited]
[Thread 0x7fff7e05a6c0 (LWP 9061) exited]
[Thread 0x7fff7d8596c0 (LWP 9062) exited]
[Thread 0x7fffd5ef96c0 (LWP 9056) exited]
[New Thread 0x7fff4e7fc6c0 (LWP 9068)]
[Thread 0x7fff4ffff6c0 (LWP 9065) exited]
[Thread 0x7fffd66fa6c0 (LWP 9055) exited]
[Thread 0x7fff4f7fe6c0 (LWP 9066) exited]
[Thread 0x7fff7c8576c0 (LWP 9064) exited]
[Thread 0x7fff4e7fc6c0 (LWP 9068) exited]
[Thread 0x7fff4effd6c0 (LWP 9067) exited]
main: llama threadpool init, n_threads = 8
main: chat template is available, enabling conversation mode (disable it with -no-cnv)
main: chat template example:
[INST]You are a helpful assistant

Hello[/INST]Hi there</s>[INST]How are you?[/INST]

system_info: n_threads = 8 (n_threads_batch = 8) / 20 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | 

main: interactive mode on.
sampler seed: 1577293089
sampler params: 
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 4096
        top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
generate: n_ctx = 4096, n_batch = 2048, n_predict = -1, n_keep = 1

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to the AI.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.
 - Not using system message. To change it, set a different value via -sys PROMPT

pabpas avatar May 12 '25 16:05 pabpas

So it gets stuck, but it doesn't crash or do anything else? You should still be able to get a callstack if you press Ctrl+C.

slaren avatar May 12 '25 17:05 slaren

Ctrl+C

Thread 1 "llama-cli" received signal SIGINT, Interrupt.
0x00007ffff55b49ee in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#0  0x00007ffff55b49ee in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff55a9668 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff55a96ad in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007ffff561dea6 in read () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007ffff559eb8d in _IO_wfile_underflow ()
   from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007ffff559d2db in _IO_wdefault_uflow ()
   from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007ffff559b785 in getwchar () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x00005555557522d3 in console::getchar32 ()
    at /home/user/src/llama.cpp-vulkan/common/console.cpp:197
#8  0x00005555557526f3 in console::readline_advanced (line="", 
--Type <RET> for more, q to quit, c to continue without paging--
    multiline_input=false)
    at /home/user/src/llama.cpp-vulkan/common/console.cpp:368
#9  0x0000555555752b96 in console::readline (line="", multiline_input=false)
    at /home/user/src/llama.cpp-vulkan/common/console.cpp:501
#10 0x00005555555d904d in main (argc=5, argv=0x7fffffffdaa8)
    at /home/user/src/llama.cpp-vulkan/tools/main/main.cpp:857
(gdb) q

Also the output of llama-bench, which shows no results but exits without error.

$ gdb --ex run --ex bt --args llama-bench -m Ministral-8B-Instruct-2410.q8.gguf -ngl 37
GNU gdb (Debian 16.3-1) 16.3
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from llama-bench...
Starting program: /usr/local/bin/llama-bench -m Ministral-8B-Instruct-2410.q8.gguf -ngl 37
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
warning: asserts enabled, performance may be affected
warning: debug build, performance may be affected
[New Thread 0x7fffe69ff6c0 (LWP 9974)]
[New Thread 0x7fffe60bd6c0 (LWP 9975)]
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Graphics (BMG G21) (Intel open-source Mesa driver) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 131072 | int dot: 1 | matrix cores: none
register_backend: registered backend Vulkan (1 devices)
register_device: registered device Vulkan0 (Intel(R) Graphics (BMG G21))
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (12th Gen Intel(R) Core(TM) i7-12700K)
[New Thread 0x7fffe801c6c0 (LWP 9983)]
[New Thread 0x7fffe58bc6c0 (LWP 9984)]
[New Thread 0x7fffe50bb6c0 (LWP 9985)]
[New Thread 0x7fffe48ba6c0 (LWP 9986)]
[New Thread 0x7fffa6d3e6c0 (LWP 9987)]
[New Thread 0x7fffa653d6c0 (LWP 9988)]
[New Thread 0x7fffa5d3c6c0 (LWP 9989)]
[Thread 0x7fffe58bc6c0 (LWP 9984) exited]
[New Thread 0x7fffa553b6c0 (LWP 9990)]
[New Thread 0x7fffa4d3a6c0 (LWP 9991)]
[New Thread 0x7fff8ffff6c0 (LWP 9992)]
[Thread 0x7fffa5d3c6c0 (LWP 9989) exited]
[Thread 0x7fffa553b6c0 (LWP 9990) exited]
[Thread 0x7fffa4d3a6c0 (LWP 9991) exited]
[Thread 0x7fffa653d6c0 (LWP 9988) exited]
[Thread 0x7fffa6d3e6c0 (LWP 9987) exited]
[Thread 0x7fffe50bb6c0 (LWP 9985) exited]
[New Thread 0x7fff8f7fe6c0 (LWP 9993)]
[Thread 0x7fff8ffff6c0 (LWP 9992) exited]
[New Thread 0x7fff8effd6c0 (LWP 9994)]
[New Thread 0x7fff8e7fc6c0 (LWP 9995)]
[New Thread 0x7fff8dffb6c0 (LWP 9996)]
[Thread 0x7fff8f7fe6c0 (LWP 9993) exited]
[Thread 0x7fff8effd6c0 (LWP 9994) exited]
[New Thread 0x7fff8d7fa6c0 (LWP 9997)]
[New Thread 0x7fff8cff96c0 (LWP 9998)]
[Thread 0x7fff8dffb6c0 (LWP 9996) exited]
[Thread 0x7fff8e7fc6c0 (LWP 9995) exited]
[New Thread 0x7fff83fff6c0 (LWP 9999)]
[New Thread 0x7fff837fe6c0 (LWP 10000)]
[Thread 0x7fff8d7fa6c0 (LWP 9997) exited]
[Thread 0x7fff8cff96c0 (LWP 9998) exited]
[Thread 0x7fff837fe6c0 (LWP 10000) exited]
[Thread 0x7fff83fff6c0 (LWP 9999) exited]
[Thread 0x7fffe48ba6c0 (LWP 9986) exited]
[New Thread 0x7fff837fe6c0 (LWP 10003)]
[New Thread 0x7fff83fff6c0 (LWP 10004)]
[Thread 0x7fff837fe6c0 (LWP 10003) exited]
[New Thread 0x7fff8cff96c0 (LWP 10005)]
[Thread 0x7fff83fff6c0 (LWP 10004) exited]
[Thread 0x7fff8cff96c0 (LWP 10005) exited]
[New Thread 0x7fff8cff96c0 (LWP 10012)]
[Thread 0x7fff8cff96c0 (LWP 10012) exited]
[Thread 0x7fffe801c6c0 (LWP 9983) exited]
[Thread 0x7fffe60bd6c0 (LWP 9975) exited]
[Thread 0x7fffe69ff6c0 (LWP 9974) exited]
[Inferior 1 (process 9971) exited normally]
No stack.
(gdb) q

pabpas avatar May 12 '25 20:05 pabpas

The first case shows that it is waiting on getwchar for your input, so that seems to be working as expected. You have to type the first line of the dialog. The llama-bench result makes no sense to me, I don't know what could cause the process to exit without error and print nothing. You could try setting a breaking on exit with catch syscall exit_group, then use bt to print the callstack.

slaren avatar May 12 '25 20:05 slaren

It all started with this commit: https://github.com/ggml-org/llama.cpp/commit/cc473cac7cea1484c1f870231073b0bf0352c6f9 Any clue there?

pabpas avatar May 12 '25 20:05 pabpas

No, I don't see any code there that could cause this.

slaren avatar May 12 '25 20:05 slaren

Today debian trixie updated some mesa libs from 25.0.4-1 to 25.0.5-1. After recompiling current master I am not able to reproduce this anymore. It works as expected so closing.

Thanks for your support @slaren!

pabpas avatar May 13 '25 20:05 pabpas