llama.cpp
llama.cpp copied to clipboard
llama_kv_cache_seq_shift does not work with cache type q4_0
The llama_kv_cache_seq_shift
or llama_kv_cache_seq_rm
(or all two of them) is broken with cache type q4_0 for K.
In the main.cpp
, these functions are used for "context swapping", meaning we can remove old tokens from sequence to make place for new tokens.
My command: ./main -m ../dolphin-2.0-mistral-7b.Q4_K_M.gguf -p "test" -n 50 --cache-type-k q4_0 -c 10
(It does work normal without the --cache-type-k q4_0
)
See the log below for more details:
stdout / stderr
Log start
main: build = 2232 (7fe4678b)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed = 1708557165
llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from ../dolphin-2.0-mistral-7b.Q4_K_M.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = ehartford_dolphin-2.0-mistral-7b
llama_model_loader: - kv 2: llama.context_length u32 = 32768
llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
llama_model_loader: - kv 4: llama.block_count u32 = 32
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 11: general.file_type u32 = 15
llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 19: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q4_K: 193 tensors
llama_model_loader: - type q6_K: 33 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V2
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 32768
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 32768
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = Q4_K - Medium
llm_load_print_meta: model params = 7.24 B
llm_load_print_meta: model size = 4.07 GiB (4.83 BPW)
llm_load_print_meta: general.name = ehartford_dolphin-2.0-mistral-7b
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.11 MiB
llm_load_tensors: CPU buffer size = 4165.37 MiB
...............................................................................................
llama_new_context_with_model: n_ctx = 10
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 0.80 MiB
llama_new_context_with_model: KV self size = 0.80 MiB, K (q4_0): 0.18 MiB, V (f16): 0.62 MiB
llama_new_context_with_model: CPU input buffer size = 9.03 MiB
llama_new_context_with_model: CPU compute buffer size = 1.41 MiB
llama_new_context_with_model: graph splits (measure): 1
system_info: n_threads = 8 / 16 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
sampling:
repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 10, n_batch = 512, n_predict = 50, n_keep = 1
test "Authentication page is displayed" do
GGML_ASSERT: ggml.c:12646: false
GGML_ASSERT: ggml.c:12646: false
GGML_ASSERT: ggml.c:12646: false
GGML_ASSERT: ggml.c:12646: false
GGML_ASSERT: ggml.c:12646: false
GGML_ASSERT: ggml.c:12646: false
GGML_ASSERT: ggml.c:12646: false
GGML_ASSERT: ggml.c:12646: false
[1] 2211794 IOT instruction (core dumped)
main.log
[1708557165] Log start
[1708557165] Cmd: ./main -m ../dolphin-2.0-mistral-7b.Q4_K_M.gguf -p test -n 50 --cache-type-k q4_0 -c 10
[1708557165] main: build = 2232 (7fe4678b)
[1708557165] main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
[1708557165] main: seed = 1708557165
[1708557165] main: llama backend init
[1708557165] main: load the model and apply lora adapter, if any
[1708557165] llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from ../dolphin-2.0-mistral-7b.Q4_K_M.gguf (version GGUF V2)
[1708557165] llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
[1708557165] llama_model_loader: - kv 0: general.architecture str = llama
[1708557165] llama_model_loader: - kv 1: general.name str = ehartford_dolphin-2.0-mistral-7b
[1708557165] llama_model_loader: - kv 2: llama.context_length u32 = 32768
[1708557165] llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
[1708557165] llama_model_loader: - kv 4: llama.block_count u32 = 32
[1708557165] llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
[1708557165] llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
[1708557165] llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
[1708557165] llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8
[1708557165] llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
[1708557165] llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000
[1708557165] llama_model_loader: - kv 11: general.file_type u32 = 15
[1708557165] llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
[1708557165] llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
[1708557165] llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
[1708557165] llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
[1708557165] llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1
[1708557165] llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2
[1708557165] llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0
[1708557165] llama_model_loader: - kv 19: general.quantization_version u32 = 2
[1708557165] llama_model_loader: - type f32: 65 tensors
[1708557165] llama_model_loader: - type q4_K: 193 tensors
[1708557165] llama_model_loader: - type q6_K: 33 tensors
[1708557165] llm_load_vocab: special tokens definition check successful ( 259/32000 ).
[1708557165] llm_load_print_meta: format = GGUF V2
[1708557165] llm_load_print_meta: arch = llama
[1708557165] llm_load_print_meta: vocab type = SPM
[1708557165] llm_load_print_meta: n_vocab = 32000
[1708557165] llm_load_print_meta: n_merges = 0
[1708557165] llm_load_print_meta: n_ctx_train = 32768
[1708557165] llm_load_print_meta: n_embd = 4096
[1708557165] llm_load_print_meta: n_head = 32
[1708557165] llm_load_print_meta: n_head_kv = 8
[1708557165] llm_load_print_meta: n_layer = 32
[1708557165] llm_load_print_meta: n_rot = 128
[1708557165] llm_load_print_meta: n_embd_head_k = 128
[1708557165] llm_load_print_meta: n_embd_head_v = 128
[1708557165] llm_load_print_meta: n_gqa = 4
[1708557165] llm_load_print_meta: n_embd_k_gqa = 1024
[1708557165] llm_load_print_meta: n_embd_v_gqa = 1024
[1708557165] llm_load_print_meta: f_norm_eps = 0.0e+00
[1708557165] llm_load_print_meta: f_norm_rms_eps = 1.0e-05
[1708557165] llm_load_print_meta: f_clamp_kqv = 0.0e+00
[1708557165] llm_load_print_meta: f_max_alibi_bias = 0.0e+00
[1708557165] llm_load_print_meta: n_ff = 14336
[1708557165] llm_load_print_meta: n_expert = 0
[1708557165] llm_load_print_meta: n_expert_used = 0
[1708557165] llm_load_print_meta: rope scaling = linear
[1708557165] llm_load_print_meta: freq_base_train = 10000.0
[1708557165] llm_load_print_meta: freq_scale_train = 1
[1708557165] llm_load_print_meta: n_yarn_orig_ctx = 32768
[1708557165] llm_load_print_meta: rope_finetuned = unknown
[1708557165] llm_load_print_meta: model type = 7B
[1708557165] llm_load_print_meta: model ftype = Q4_K - Medium
[1708557165] llm_load_print_meta: model params = 7.24 B
[1708557165] llm_load_print_meta: model size = 4.07 GiB (4.83 BPW)
[1708557165] llm_load_print_meta: general.name = ehartford_dolphin-2.0-mistral-7b
[1708557165] llm_load_print_meta: BOS token = 1 '<s>'
[1708557165] llm_load_print_meta: EOS token = 2 '</s>'
[1708557165] llm_load_print_meta: UNK token = 0 '<unk>'
[1708557165] llm_load_print_meta: LF token = 13 '<0x0A>'
[1708557165] llm_load_tensors: ggml ctx size = 0.11 MiB
[1708557166] llm_load_tensors: CPU buffer size = 4165.37 MiB
[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166] .[1708557166]
[1708557166] llama_new_context_with_model: n_ctx = 10
[1708557166] llama_new_context_with_model: freq_base = 10000.0
[1708557166] llama_new_context_with_model: freq_scale = 1
[1708557166] llama_kv_cache_init: CPU KV buffer size = 0.80 MiB
[1708557166] llama_new_context_with_model: KV self size = 0.80 MiB, K (q4_0): 0.18 MiB, V (f16): 0.62 MiB
[1708557166] llama_new_context_with_model: CPU input buffer size = 9.03 MiB
[1708557166] llama_new_context_with_model: CPU compute buffer size = 1.41 MiB
[1708557166] llama_new_context_with_model: graph splits (measure): 1
[1708557166] warming up the model with an empty run
[1708557166] n_ctx: 10
[1708557166]
[1708557166] system_info: n_threads = 8 / 16 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
[1708557166] add_bos: 1
[1708557166] tokenize the prompt
[1708557166] prompt: "test"
[1708557166] tokens: [ '':1, ' test':1369 ]
[1708557166] recalculate the cached logits (check): embd_inp.empty() false, n_matching_session_tokens 0, embd_inp.size() 2, session_tokens.size() 0, embd_inp.size() 2
[1708557166] inp_pfx: [ '':1, ' ':28705, '':13, '':13, '###':27332, ' Inst':3133, 'ruction':3112, ':':28747, '':13, '':13 ]
[1708557166] inp_sfx: [ ' ':28705, '':13, '':13, '###':27332, ' Response':12107, ':':28747, '':13, '':13 ]
[1708557166] cml_pfx: [ '':1, ' ':28705, '':13, '<':28789, '|':28766, 'im':321, '_':28730, 'start':2521, '|':28766, '>':28767, 'user':1838, '':13 ]
[1708557166] cml_sfx: [ ' <':523, '|':28766, 'im':321, '_':28730, 'end':416, '|':28766, '>':28767, '':13, '<':28789, '|':28766, 'im':321, '_':28730, 'start':2521, '|':28766, '>':28767, 'ass':489, 'istant':11143, '':13 ]
[1708557166] sampling:
repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
[1708557166] sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
[1708557166] generate: n_ctx = 10, n_batch = 512, n_predict = 50, n_keep = 1
[1708557166]
[1708557166] embd_inp.size(): 2, n_consumed: 0
[1708557166] eval: [ '':1, ' test':1369 ]
[1708557166] n_past = 2
[1708557166] sampled token: 345: ' "'
[1708557166] last: [ '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':1, ' test':1369, ' "':345 ]
[1708557166] n_remain: 49
[1708557166] eval: [ ' "':345 ]
[1708557166] n_past = 3
[1708557166] sampled token: 19504: 'Authentication'
[1708557166] last: [ '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':1, ' test':1369, ' "':345, 'Authentication':19504 ]
[1708557166] n_remain: 48
[1708557166] eval: [ 'Authentication':19504 ]
[1708557166] n_past = 4
[1708557166] sampled token: 2884: ' page'
[1708557166] last: [ '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':1, ' test':1369, ' "':345, 'Authentication':19504, ' page':2884 ]
[1708557166] n_remain: 47
[1708557166] eval: [ ' page':2884 ]
[1708557167] n_past = 5
[1708557167] sampled token: 349: ' is'
[1708557167] last: [ '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':1, ' test':1369, ' "':345, 'Authentication':19504, ' page':2884, ' is':349 ]
[1708557167] n_remain: 46
[1708557167] eval: [ ' is':349 ]
[1708557167] n_past = 6
[1708557167] sampled token: 13992: ' displayed'
[1708557167] last: [ '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':1, ' test':1369, ' "':345, 'Authentication':19504, ' page':2884, ' is':349, ' displayed':13992 ]
[1708557167] n_remain: 45
[1708557167] eval: [ ' displayed':13992 ]
[1708557167] n_past = 7
[1708557167] sampled token: 28739: '"'
[1708557167] last: [ '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':1, ' test':1369, ' "':345, 'Authentication':19504, ' page':2884, ' is':349, ' displayed':13992, '"':28739 ]
[1708557167] n_remain: 44
[1708557167] eval: [ '"':28739 ]
[1708557167] n_past = 8
[1708557167] sampled token: 511: ' do'
[1708557167] last: [ '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':1, ' test':1369, ' "':345, 'Authentication':19504, ' page':2884, ' is':349, ' displayed':13992, '"':28739, ' do':511 ]
[1708557167] n_remain: 43
[1708557167] eval: [ ' do':511 ]
[1708557167] n_past = 9
[1708557167] sampled token: 13: '
'
[1708557167] last: [ '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':1, ' test':1369, ' "':345, 'Authentication':19504, ' page':2884, ' is':349, ' displayed':13992, '"':28739, ' do':511, '':13 ]
[1708557167] n_remain: 42
[1708557167] eval: [ '':13 ]
[1708557167] n_past = 10
[1708557167] sampled token: 28705: ' '
[1708557167] last: [ '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':1, ' test':1369, ' "':345, 'Authentication':19504, ' page':2884, ' is':349, ' displayed':13992, '"':28739, ' do':511, '':13, ' ':28705 ]
[1708557167] n_remain: 41
[1708557167] context full, swapping: n_past = 10, n_left = 9, n_ctx = 10, n_keep = 1, n_discard = 4
[1708557167] after swap: n_past = 6, n_past_guidance = 0
[1708557167] embd: [ ' ':28705 ]
[1708557167] clear session path
[1708557167] eval: [ ' ':28705 ]
@slaren Sorry for bothering you again, I leave this bug report here so you can take a look when you want. It's not urgent. Thank you!
I have opened #5653, but this requires changes in the backends and it is not priority at the moment.
This issue was closed because it has been inactive for 14 days since being marked as stale.