bug: Performance drop on the second request
I use silicone-maid-7b, but similar behavior is observed on the officially supported openchat-3.5-7b. The first request goes at about 30-40, the second at most 1-5. If I reload the model, the first request goes quickly again. What could it be connected with? This is not observed in the stable version 0.4.4. (I use night version)
- Operating System: Windows 11
- Jan Version: Jan v0.4.4-194
- Processor: AMD Ryzen 5 5600X 6-Core Processor
- RAM: 32GB
- GPU: RTX 3060 12GB
I use nous-capybara-34b.Q5_K_M.gguf with Jan v0.4.4. First request goes fast, second at 0.15, and it crashes on the third request with "Apologies, something's amiss!" - Operating System: Windows 11 - Jan Version: Jan v0.4.4 - Processor: Intel Core i5 11th Gen - RAM: 64 GB - GPU: RTX 3090 24GB
We consider this quite critical. @cmepeo @fu3fi Can you give us logs: https://jan.ai/troubleshooting/how-to-get-error-logs/ Make note to redact any personal info.
app.log My logs may be broken because I returned from the nightly version to the stable one, by the way, after that the problem disappeared. Therefore, I associate the slow generation with an error in the nightly.
Could you also tell me what specific sensitive data should be deleted in the logs? With a quick glance I didn't find anything that needed to be removed.
app.log Same behavior with Trinity models (imho it's not model dependent), I cleared log I attach here before starting to reproduce, with some basic questions (about moon landing). First answer token speed 57.49s, second 33.52s, third 28.07s, four 23.65s, fifth 15.06 and last one under 2s, very slow. During slow time GPU and CPU was underutilized (vs when it was quick at first answer), VRAM was around 6.5GB like all the time (RTX3070 8GB VRAM, using GPU in settings obviously). Only one CPU core seems to work more than the other, it was around 64 to 87% usage, without raising its temperature (temp weirdly similar to other cores)... Edit:
- Operating System: Windows 11
- Jan Version: Jan v0.4.5 - 210
- Processor: Intel Core i7 10700K
- RAM: 32 GB
- GPU: RTX 3070 8GB
hi , according to the log, the speed seems to be pretty consistent between chats? maybe this is not a backend error but rendering issue that cause slow down @louis-jan
I don't know what your development cycle is, but it looks like nightly versions are moving to the stable branch after a while. This problem appears on the brand new 0.4.5. If you roll back to 0.4.4 it disappears. I am attaching the updated log (only three requests). Speeds were 40-25-7 app.log
I don't know what your development cycle is, but it looks like nightly versions are moving to the stable branch after a while. This problem appears on the brand new 0.4.5. If you roll back to 0.4.4 it disappears. I am attaching the updated log (only three requests). Speeds were 40-25-7
@Van-QA Pls take a look as well thank you
@tikikun I notice differences in each run (llamacpp log): 16.25 ms per token -> 20.40 -> 16.58 -> 29.45 -> 16.40 -> 35.32 -> 16.46 -> 42.51
Would it be related to the slot/queue system?
Hi it's very possible that there is something related to CUDA offloading to RAM if we cannot reproduce this on CPU
can you try turning this off if possible, it seems optional?
https://www.reddit.com/r/StableDiffusion/comments/17km6v0/new_nvidia_driver_makes_offloading_to_ram_optional/
can you try turning this off if possible, it seems optional?
https://www.reddit.com/r/StableDiffusion/comments/17km6v0/new_nvidia_driver_makes_offloading_to_ram_optional/
I did that, as global settings on the NVidia Control Panel, as I wasn't sure which application/process I needed to set and it was more secure do it globally (French, sorry)
Rebooted PC before launching Jan.
No difference, performance still drop (log attached, cleared before experiment).
Token time: 58.82, 58.46, 37.94, 24.20, 17.35, 14.73, 9.27, 9.73, 7.27 and 6.44.
First screenshot :
Last screenshot:
I have all screenshot in between if needed.
Let me know if you want me to make other tests.
same problem app.log
I made a comment on Discord, but then I found this issue with similar behavior as I am seeing. I am running macOS Sonoma 14.3, M3 Max MBP. I am running the latest nightly, but the issue shows with different (also stable) versions. I monitor the GPU usage with Asitop.
When running a model via Obaabogaa text ui, I see an almost constant GPU usage of 98%. When running a model via Jan, it varies greatly. First prompt is sometimes around 98% as well, but mostly I see GPU usage between 30%-60% and constantly changing with ups and downs. After the prompt is finished, I do see a 99% usage as if like a cache is emptied or something. This can take quite a while, but the output has already been generated. Obaabogaa doesn't show this behaviour and only shows GPU usage when generating a response. Tried multiple parameters ngl and memlock, but doesn't seem to make a difference.
I ran the same conversation via Obaabogaa and got the following results with regard to speed. Also, Jan hanged after a few questions, where it doesn't happen via Obaabogaa and I can just continue the conversation.
Obaabogaa:
`llama_model_loader: loaded meta data with 26 key-value pairs and 995 tensors from models/mixtral-8x7b-instruct-v0.1.Q6_K.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = mistralai_mixtral-8x7b-instruct-v0.1
llama_model_loader: - kv 2: llama.context_length u32 = 32768
llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
llama_model_loader: - kv 4: llama.block_count u32 = 32
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 9: llama.expert_count u32 = 8
llama_model_loader: - kv 10: llama.expert_used_count u32 = 2
llama_model_loader: - kv 11: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 12: llama.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 13: general.file_type u32 = 18
llama_model_loader: - kv 14: tokenizer.ggml.model str = llama
llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,32000] = ["", "", "<0x00>", "<...
llama_model_loader: - kv 16: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 20: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 22: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 23: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 24: tokenizer.chat_template str = {{ bos_token }}{% for message in mess...
llama_model_loader: - kv 25: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type f16: 32 tensors
llama_model_loader: - type q8_0: 64 tensors
llama_model_loader: - type q6_K: 834 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 32768
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 8
llm_load_print_meta: n_expert_used = 2
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 32768
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = Q6_K
llm_load_print_meta: model params = 46.70 B
llm_load_print_meta: model size = 35.74 GiB (6.57 BPW)
llm_load_print_meta: general.name = mistralai_mixtral-8x7b-instruct-v0.1
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 ''
llm_load_print_meta: UNK token = 0 '
14:20:59-133827 INFO TRUNCATION LENGTH: 32768
14:20:59-134076 INFO INSTRUCTION TEMPLATE: Custom (obtained from model metadata)
14:20:59-134350 INFO Loaded the model in 1.42 seconds.
llama_print_timings: load time = 2280.25 ms llama_print_timings: sample time = 25.57 ms / 335 runs ( 0.08 ms per token, 13099.75 tokens per second) llama_print_timings: prompt eval time = 2279.87 ms / 189 tokens ( 12.06 ms per token, 82.90 tokens per second) llama_print_timings: eval time = 12493.18 ms / 334 runs ( 37.40 ms per token, 26.73 tokens per second) llama_print_timings: total time = 15261.96 ms / 523 tokens Output generated in 15.48 seconds (21.64 tokens/s, 335 tokens, context 189, seed 755109104) Llama.generate: prefix-match hit
llama_print_timings: load time = 2280.25 ms llama_print_timings: sample time = 27.96 ms / 370 runs ( 0.08 ms per token, 13235.08 tokens per second) llama_print_timings: prompt eval time = 3233.17 ms / 369 tokens ( 8.76 ms per token, 114.13 tokens per second) llama_print_timings: eval time = 14248.54 ms / 369 runs ( 38.61 ms per token, 25.90 tokens per second) llama_print_timings: total time = 18030.05 ms / 738 tokens Output generated in 18.22 seconds (20.25 tokens/s, 369 tokens, context 552, seed 1869833238) Llama.generate: prefix-match hit
llama_print_timings: load time = 2280.25 ms llama_print_timings: sample time = 29.06 ms / 381 runs ( 0.08 ms per token, 13110.35 tokens per second) llama_print_timings: prompt eval time = 3422.84 ms / 413 tokens ( 8.29 ms per token, 120.66 tokens per second) llama_print_timings: eval time = 15176.34 ms / 380 runs ( 39.94 ms per token, 25.04 tokens per second) llama_print_timings: total time = 19205.92 ms / 793 tokens Output generated in 19.42 seconds (19.56 tokens/s, 380 tokens, context 959, seed 433586916) Llama.generate: prefix-match hit
llama_print_timings: load time = 2280.25 ms llama_print_timings: sample time = 9.88 ms / 125 runs ( 0.08 ms per token, 12650.54 tokens per second) llama_print_timings: prompt eval time = 3414.54 ms / 415 tokens ( 8.23 ms per token, 121.54 tokens per second) llama_print_timings: eval time = 4964.68 ms / 124 runs ( 40.04 ms per token, 24.98 tokens per second) llama_print_timings: total time = 8578.33 ms / 539 tokens Output generated in 8.78 seconds (14.13 tokens/s, 124 tokens, context 1368, seed 2096408388) Llama.generate: prefix-match hit
llama_print_timings: load time = 2280.25 ms llama_print_timings: sample time = 10.03 ms / 129 runs ( 0.08 ms per token, 12856.29 tokens per second) llama_print_timings: prompt eval time = 2188.21 ms / 165 tokens ( 13.26 ms per token, 75.40 tokens per second) llama_print_timings: eval time = 5262.53 ms / 128 runs ( 41.11 ms per token, 24.32 tokens per second) llama_print_timings: total time = 7660.03 ms / 293 tokens Output generated in 7.90 seconds (16.21 tokens/s, 128 tokens, context 1527, seed 1173716521) Llama.generate: prefix-match hit
llama_print_timings: load time = 2280.25 ms llama_print_timings: sample time = 9.48 ms / 112 runs ( 0.08 ms per token, 11814.35 tokens per second) llama_print_timings: prompt eval time = 2238.92 ms / 169 tokens ( 13.25 ms per token, 75.48 tokens per second) llama_print_timings: eval time = 4513.84 ms / 111 runs ( 40.67 ms per token, 24.59 tokens per second) llama_print_timings: total time = 6928.64 ms / 280 tokens Output generated in 7.21 seconds (15.39 tokens/s, 111 tokens, context 1690, seed 1594632536) Llama.generate: prefix-match hit
llama_print_timings: load time = 2280.25 ms llama_print_timings: sample time = 13.73 ms / 171 runs ( 0.08 ms per token, 12456.29 tokens per second) llama_print_timings: prompt eval time = 2095.17 ms / 133 tokens ( 15.75 ms per token, 63.48 tokens per second) llama_print_timings: eval time = 6986.66 ms / 170 runs ( 41.10 ms per token, 24.33 tokens per second) llama_print_timings: total time = 9364.93 ms / 303 tokens Output generated in 9.56 seconds (17.78 tokens/s, 170 tokens, context 1817, seed 1477773945)`
The conversation timings from Jan's logfile: ` 2024-02-08T12:11:39.510Z [NITRO]::Error: llama_new_context_with_model: n_ctx = 32000 llama_new_context_with_model: freq_base = 1000000.0 llama_new_context_with_model: freq_scale = 1 ggml_metal_init: allocating
2024-02-08T12:11:39.510Z [NITRO]::Error: ggml_metal_init: found device: Apple M3 Max
2024-02-08T12:11:39.510Z [NITRO]::Error: ggml_metal_init: picking default device: Apple M3 Max
2024-02-08T12:11:39.512Z [NITRO]::Error: ggml_metal_init: default.metallib not found, loading from source
2024-02-08T12:11:39.512Z [NITRO]::Error: ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil ggml_metal_init: loading '/Users/vincent/jan/extensions/@janhq/inference-nitro-extension/dist/bin/mac-arm64/ggml-metal.metal'
2024-02-08T12:11:40.042Z [NITRO]::Error: ggml_metal_init: GPU name: Apple M3 Max ggml_metal_init: GPU family: MTLGPUFamilyApple9 (1009) ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_init: simdgroup reduction support = true ggml_metal_init: simdgroup matrix mul. support = true ggml_metal_init: hasUnifiedMemory = true ggml_metal_init: recommendedMaxWorkingSetSize = 103079.22 MB
2024-02-08T12:11:41.729Z [NITRO]::Error: ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 4000.00 MiB, (40503.52 / 98304.00)
2024-02-08T12:11:41.971Z [NITRO]::Error: llama_kv_cache_init: Metal KV buffer size = 4000.00 MiB llama_new_context_with_model: KV self size = 4000.00 MiB, K (f16): 2000.00 MiB, V (f16): 2000.00 MiB llama_new_context_with_model: CPU input buffer size = 70.63 MiB
2024-02-08T12:11:41.971Z [NITRO]::Error: ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 0.02 MiB, (40503.53 / 98304.00)
2024-02-08T12:11:41.973Z [NITRO]::Error: ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 2326.00 MiB, (42829.52 / 98304.00)
2024-02-08T12:11:41.973Z [NITRO]::Error: llama_new_context_with_model: Metal compute buffer size = 2325.99 MiB llama_new_context_with_model: CPU compute buffer size = 8.80 MiB llama_new_context_with_model: graph splits (measure): 3
2024-02-08T12:11:43.254Z [NITRO]::Debug: [1707394303] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 593][ initialize] Available slots: [1707394303] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 601][ initialize] -> Slot 0 - max context: 32000
2024-02-08T12:11:43.254Z [NITRO]::Debug: 20240208 12:11:43.254823 UTC 4863913 INFO Started background task here! - llamaCPP.cc:572 [1707394303] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1541][ update_slots] all slots are idle and system prompt is empty, clear the KV cache
2024-02-08T12:11:43.255Z [NITRO]::Debug: [1707394303] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 879][ launch_slot_with_data] slot 0 is processing [task id: 0]
2024-02-08T12:11:43.255Z [NITRO]::Debug: [1707394303] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1734][ update_slots] slot 0 : kv cache rm - [0, end)
2024-02-08T12:11:43.493Z [NITRO]::Debug: [1707394303] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 472][ print_timings] [1707394303] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 477][ print_timings] print_timings: prompt eval time = 130.41 ms / 2 tokens ( 65.21 ms per token, 15.34 tokens per second) [1707394303] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 482][ print_timings] print_timings: eval time = 107.27 ms / 4 runs ( 26.82 ms per token, 37.29 tokens per second) [1707394303] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 484][ print_timings] print_timings: total time = 237.69 ms
2024-02-08T12:11:43.493Z [NITRO]::Debug: [1707394303] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (7 tokens in cache)
2024-02-08T12:11:43.498Z [NITRO]::Debug: Load model success with response {} 2024-02-08T12:11:43.500Z [NITRO]::Debug: Validate model state with response 200 2024-02-08T12:11:43.501Z [NITRO]::Debug: Validate model state success with response {"model_data":"{"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"/Users/vincent/jan/models/mixtral-8x7b-instruct/mixtral-8x7b-instruct-v0.1.Q6_K.gguf","n_ctx":32000,"n_keep":0,"n_predict":2,"n_probs":0,"penalize_nl":true,"penalty_prompt_tokens":[],"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.100000023841858,"seed":4294967295,"stop":[],"stream":false,"temperature":0.800000011920929,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0,"use_penalty_prompt_tokens":false}","model_loaded":true} 2024-02-08T12:12:23.562Z [NITRO]::Debug: 20240208 12:11:43.493077 UTC 4863913 INFO {"content":" everyone! I have a","generation_settings":{"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"/Users/vincent/jan/models/mixtral-8x7b-instruct/mixtral-8x7b-instruct-v0.1.Q6_K.gguf","n_ctx":32000,"n_keep":0,"n_predict":2,"n_probs":0,"penalize_nl":true,"penalty_prompt_tokens":[],"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.100000023841858,"seed":4294967295,"stop":[],"stream":false,"temperature":0.800000011920929,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0,"use_penalty_prompt_tokens":false},"model":"/Users/vincent/jan/models/mixtral-8x7b-instruct/mixtral-8x7b-instruct-v0.1.Q6_K.gguf","prompt":"Hello","slot_id":0,"stop":true,"stopped_eos":false,"stopped_limit":true,"stopped_word":false,"stopping_word":"","timings":{"predicted_ms":107.274,"predicted_n":4,"predicted_per_second":37.287693196860374,"predicted_per_token_ms":26.8185,"prompt_ms":130.411,"prompt_n":2,"prompt_per_second":15.336129620967556,"prompt_per_token_ms":65.2055},"tokens_cached":6,"tokens_evaluated":2,"tokens_predicted":4,"truncated":false} - llamaCPP.cc:133 20240208 12:12:23.562325 UTC 4863915 DEBUG [makeHeaderString] send stream with transfer-encoding chunked - HttpResponseImpl.cc:535 [1707394343] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 879][ launch_slot_with_data] slot 0 is processing [task id: 1]
2024-02-08T12:12:23.562Z [NITRO]::Debug: [1707394343] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1734][ update_slots] slot 0 : kv cache rm - [0, end)
2024-02-08T12:12:44.279Z [NITRO]::Debug: [1707394364] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 472][ print_timings] [1707394364] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 477][ print_timings] print_timings: prompt eval time = 1995.80 ms / 110 tokens ( 18.14 ms per token, 55.12 tokens per second) [1707394364] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 482][ print_timings] print_timings: eval time = 18720.77 ms / 462 runs ( 40.52 ms per token, 24.68 tokens per second) [1707394364] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 484][ print_timings] print_timings: total time = 20716.57 ms [1707394364] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (573 tokens in cache)
2024-02-08T12:12:44.279Z [NITRO]::Debug: 20240208 12:12:44.279052 UTC 4863915 INFO reached result stop - llamaCPP.cc:354 20240208 12:12:44.279082 UTC 4863915 INFO Connection closed or buffer is null. Reset context - llamaCPP.cc:318 [1707394364] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (573 tokens in cache) [1707394364] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 879][ launch_slot_with_data] slot 0 is processing [task id: 3] [1707394364] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1462][ process_tasks] slot unavailable
2024-02-08T12:12:44.279Z [NITRO]::Debug: [1707394364] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1734][ update_slots] slot 0 : kv cache rm - [0, end)
2024-02-08T12:12:59.399Z [NITRO]::Debug: [1707394379] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 472][ print_timings] [1707394379] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 477][ print_timings] print_timings: prompt eval time = 816.09 ms / 110 tokens ( 7.42 ms per token, 134.79 tokens per second) [1707394379] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 482][ print_timings] print_timings: eval time = 14304.35 ms / 379 runs ( 37.74 ms per token, 26.50 tokens per second) [1707394379] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 484][ print_timings] print_timings: total time = 15120.43 ms
2024-02-08T12:12:59.399Z [NITRO]::Debug: [1707394379] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (490 tokens in cache)
2024-02-08T12:13:20.333Z [NITRO]::Debug: 20240208 12:13:20.333613 UTC 4863915 DEBUG [makeHeaderString] send stream with transfer-encoding chunked - HttpResponseImpl.cc:535 [1707394400] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 879][ launch_slot_with_data] slot 0 is processing [task id: 6]
2024-02-08T12:13:20.335Z [NITRO]::Debug: [1707394400] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1734][ update_slots] slot 0 : kv cache rm - [0, end)
2024-02-08T12:14:01.180Z [NITRO]::Debug: [1707394441] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 472][ print_timings] [1707394441] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 477][ print_timings] print_timings: prompt eval time = 4852.31 ms / 601 tokens ( 8.07 ms per token, 123.86 tokens per second) [1707394441] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 482][ print_timings] print_timings: eval time = 35994.71 ms / 329 runs ( 109.41 ms per token, 9.14 tokens per second) [1707394441] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 484][ print_timings] print_timings: total time = 40847.02 ms [1707394441] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (931 tokens in cache)
2024-02-08T12:14:01.181Z [NITRO]::Debug: 20240208 12:14:01.180823 UTC 4863915 INFO reached result stop - llamaCPP.cc:354 20240208 12:14:01.180856 UTC 4863915 INFO Connection closed or buffer is null. Reset context - llamaCPP.cc:318 [1707394441] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1462][ process_tasks] slot unavailable [1707394441] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (931 tokens in cache) [1707394441] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 879][ launch_slot_with_data] slot 0 is processing [task id: 9] [1707394441] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (931 tokens in cache)
2024-02-09T13:08:45.960Z [NITRO]::Debug: 20240209 13:08:45.959838 UTC 4863916 DEBUG [makeHeaderString] send stream with transfer-encoding chunked - HttpResponseImpl.cc:535 [1707484125] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 879][ launch_slot_with_data] slot 0 is processing [task id: 11]
2024-02-09T13:08:45.961Z [NITRO]::Debug: [1707484125] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1734][ update_slots] slot 0 : kv cache rm - [0, end)
2024-02-09T13:09:32.947Z [NITRO]::Debug: [1707484172] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 472][ print_timings] [1707484172] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 477][ print_timings] print_timings: prompt eval time = 6708.91 ms / 969 tokens ( 6.92 ms per token, 144.43 tokens per second) [1707484172] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 482][ print_timings] print_timings: eval time = 40278.11 ms / 394 runs ( 102.23 ms per token, 9.78 tokens per second) [1707484172] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 484][ print_timings] print_timings: total time = 46987.01 ms [1707484172] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (1364 tokens in cache)
2024-02-09T13:09:32.947Z [NITRO]::Debug: 20240209 13:09:32.947031 UTC 4863916 INFO reached result stop - llamaCPP.cc:354 20240209 13:09:32.947058 UTC 4863916 INFO Connection closed or buffer is null. Reset context - llamaCPP.cc:318 [1707484172] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (1364 tokens in cache) [1707484172] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 879][ launch_slot_with_data] slot 0 is processing [task id: 13] [1707484172] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1462][ process_tasks] slot unavailable
2024-02-09T13:09:32.948Z [NITRO]::Debug: [1707484172] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1734][ update_slots] slot 0 : kv cache rm - [0, end)
2024-02-09T13:09:57.168Z [NITRO]::Debug: [1707484197] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 472][ print_timings] [1707484197] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 477][ print_timings] print_timings: prompt eval time = 5280.75 ms / 969 tokens ( 5.45 ms per token, 183.50 tokens per second) [1707484197] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 482][ print_timings] print_timings: eval time = 18940.88 ms / 452 runs ( 41.90 ms per token, 23.86 tokens per second) [1707484197] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 484][ print_timings] print_timings: total time = 24221.63 ms [1707484197] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (1422 tokens in cache)
2024-02-09T13:10:33.072Z [NITRO]::Debug: 20240209 13:10:33.071736 UTC 4863917 DEBUG [makeHeaderString] send stream with transfer-encoding chunked - HttpResponseImpl.cc:535 [1707484233] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 879][ launch_slot_with_data] slot 0 is processing [task id: 16]
2024-02-09T13:10:33.074Z [NITRO]::Debug: [1707484233] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1734][ update_slots] slot 0 : kv cache rm - [0, end)
2024-02-09T13:11:04.409Z [NITRO]::Debug: [1707484264] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 472][ print_timings] [1707484264] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 477][ print_timings] print_timings: prompt eval time = 8850.84 ms / 1392 tokens ( 6.36 ms per token, 157.27 tokens per second) [1707484264] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 482][ print_timings] print_timings: eval time = 22486.06 ms / 216 runs ( 104.10 ms per token, 9.61 tokens per second) [1707484264] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 484][ print_timings] print_timings: total time = 31336.90 ms [1707484264] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (1609 tokens in cache)
2024-02-09T13:11:04.409Z [NITRO]::Debug: 20240209 13:11:04.408821 UTC 4863917 INFO reached result stop - llamaCPP.cc:354 20240209 13:11:04.408853 UTC 4863917 INFO Connection closed or buffer is null. Reset context - llamaCPP.cc:318 [1707484264] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (1609 tokens in cache) [1707484264] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 879][ launch_slot_with_data] slot 0 is processing [task id: 18] [1707484264] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1462][ process_tasks] slot unavailable
2024-02-09T13:11:04.410Z [NITRO]::Debug: [1707484264] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1734][ update_slots] slot 0 : kv cache rm - [0, end)
2024-02-09T13:11:21.458Z [NITRO]::Debug: [1707484281] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 472][ print_timings] [1707484281] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 477][ print_timings] print_timings: prompt eval time = 7499.45 ms / 1392 tokens ( 5.39 ms per token, 185.61 tokens per second) [1707484281] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 482][ print_timings] print_timings: eval time = 9549.99 ms / 228 runs ( 41.89 ms per token, 23.87 tokens per second) [1707484281] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 484][ print_timings] print_timings: total time = 17049.44 ms
2024-02-09T13:11:21.458Z [NITRO]::Debug: [1707484281] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (1621 tokens in cache)
2024-02-09T13:11:27.604Z [NITRO]::Debug: 20240209 13:11:27.603667 UTC 4863917 INFO Clean cache threshold reached! - llamaCPP.cc:169 20240209 13:11:27.603895 UTC 4863917 INFO Cache cleaned - llamaCPP.cc:171 20240209 13:11:27.603931 UTC 4863917 DEBUG [makeHeaderString] send stream with transfer-encoding chunked - HttpResponseImpl.cc:535 [1707484287] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 879][ launch_slot_with_data] slot 0 is processing [task id: 21]
2024-02-09T13:11:27.607Z [NITRO]::Debug: [1707484287] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1734][ update_slots] slot 0 : kv cache rm - [0, end)
2024-02-09T13:12:02.279Z [NITRO]::Debug: [1707484322] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 472][ print_timings] [1707484322] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 477][ print_timings] print_timings: prompt eval time = 10632.11 ms / 1643 tokens ( 6.47 ms per token, 154.53 tokens per second) [1707484322] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 482][ print_timings] print_timings: eval time = 24043.01 ms / 215 runs ( 111.83 ms per token, 8.94 tokens per second) [1707484322] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 484][ print_timings] print_timings: total time = 34675.13 ms [1707484322] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (1859 tokens in cache)
2024-02-09T13:12:02.279Z [NITRO]::Debug: 20240209 13:12:02.279269 UTC 4863917 INFO reached result stop - llamaCPP.cc:354 [1707484322] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (1859 tokens in cache) 20240209 13:12:02.279294 UTC 4863917 INFO Connection closed or buffer is null. Reset context - llamaCPP.cc:318 [1707484322] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 879][ launch_slot_with_data] slot 0 is processing [task id: 23] [1707484322] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1462][ process_tasks] slot unavailable
2024-02-09T13:12:02.281Z [NITRO]::Debug: [1707484322] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1734][ update_slots] slot 0 : kv cache rm - [0, end)
2024-02-09T13:12:21.678Z [NITRO]::Debug: [1707484341] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 472][ print_timings] [1707484341] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 477][ print_timings] print_timings: prompt eval time = 9168.33 ms / 1643 tokens ( 5.58 ms per token, 179.20 tokens per second) [1707484341] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 482][ print_timings] print_timings: eval time = 10230.34 ms / 229 runs ( 44.67 ms per token, 22.38 tokens per second) [1707484341] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 484][ print_timings] print_timings: total time = 19398.68 ms [1707484341] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (1873 tokens in cache)
2024-02-09T13:16:04.521Z [NITRO]::Debug: 20240209 13:16:04.520875 UTC 4863918 DEBUG [makeHeaderString] send stream with transfer-encoding chunked - HttpResponseImpl.cc:535 [1707484564] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 879][ launch_slot_with_data] slot 0 is processing [task id: 26]
2024-02-09T13:16:04.523Z [NITRO]::Debug: [1707484564] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1734][ update_slots] slot 0 : kv cache rm - [0, end)
2024-02-09T13:16:38.566Z [NITRO]::Debug: [1707484598] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 472][ print_timings] [1707484598] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 477][ print_timings] print_timings: prompt eval time = 11571.90 ms / 1894 tokens ( 6.11 ms per token, 163.67 tokens per second) [1707484598] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 482][ print_timings] print_timings: eval time = 22473.04 ms / 165 runs ( 136.20 ms per token, 7.34 tokens per second) [1707484598] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 484][ print_timings] print_timings: total time = 34044.94 ms
2024-02-09T13:16:38.566Z [NITRO]::Debug: [1707484598] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (2060 tokens in cache) 20240209 13:16:38.565988 UTC 4863918 INFO reached result stop - llamaCPP.cc:354 20240209 13:16:38.566013 UTC 4863918 INFO Connection closed or buffer is null. Reset context - llamaCPP.cc:318 [1707484598] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (2060 tokens in cache) [1707484598] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 879][ launch_slot_with_data] slot 0 is processing [task id: 28] [1707484598] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1462][ process_tasks] slot unavailable
2024-02-09T13:16:38.568Z [NITRO]::Debug: [1707484598] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1734][ update_slots] slot 0 : kv cache rm - [0, end)
2024-02-09T13:16:48.830Z [NITRO]::Debug: 20240209 13:16:44.291119 UTC 4863918 DEBUG [makeHeaderString] send stream with transfer-encoding chunked - HttpResponseImpl.cc:535 [1707484608] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1462][ process_tasks] slot unavailable
2024-02-09T13:17:00.811Z [NITRO]::Debug: 20240209 13:16:48.947207 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:16:49.452280 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:16:49.957318 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:16:50.460123 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:16:50.965151 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:16:51.470196 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:16:51.975217 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:16:52.476765 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:16:52.981789 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:16:53.486825 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:16:53.991853 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:16:54.496020 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:16:55.001061 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:16:55.506078 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:16:56.011092 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:16:56.516120 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:16:57.021146 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:16:57.526171 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:16:58.031198 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:16:58.536229 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:16:59.041252 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:16:59.546274 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:00.051296 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:00.556312 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 [1707484620] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 472][ print_timings] [1707484620] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 477][ print_timings] print_timings: prompt eval time = 10425.15 ms / 1894 tokens ( 5.50 ms per token, 181.68 tokens per second) [1707484620] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 482][ print_timings] print_timings: eval time = 11819.90 ms / 268 runs ( 44.10 ms per token, 22.67 tokens per second) [1707484620] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 484][ print_timings] print_timings: total time = 22245.05 ms [1707484620] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (2163 tokens in cache)
2024-02-09T13:17:41.351Z [NITRO]::Debug: 20240209 13:17:01.061357 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:01.561942 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:02.064131 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:02.566253 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:03.071131 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:03.572668 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:04.074167 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:04.579217 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:05.084250 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:05.584508 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:06.084763 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:06.589828 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:07.090663 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:07.595709 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:08.096794 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:08.601160 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:09.101618 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:09.606669 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:10.111461 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:10.616516 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:11.121550 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:11.624688 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:12.129340 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:12.631023 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:13.131272 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:13.636292 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:14.140160 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:14.645222 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:15.146762 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:15.650648 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:16.154307 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:16.659399 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:17.164000 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:17.669040 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:18.178167 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:18.683212 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:19.187530 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:19.692586 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:20.196732 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:20.701773 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:21.204722 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:21.708723 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:22.213750 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:22.718792 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:23.221878 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:23.726228 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:24.228180 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:24.732609 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:25.236706 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:25.741640 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:26.245665 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:26.749041 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:27.254121 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:27.759198 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:28.261408 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:28.762943 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:29.266638 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:29.771736 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:30.271885 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:30.776377 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:31.279397 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:31.784464 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:32.285542 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:32.790585 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:33.295380 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:33.801790 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:34.302253 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:34.805086 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:35.310116 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:35.812654 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:36.317722 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:36.817938 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:37.319922 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:37.824891 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:38.329248 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:38.834300 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:39.337937 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:39.841499 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:40.342810 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:40.847275 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:41.350998 UTC 486 2024-02-09T13:17:59.599Z [NITRO]::Debug: 3918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:41.855292 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:42.360335 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:42.864801 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:43.369827 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:43.873916 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:44.378963 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:44.884012 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:45.389046 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:45.889728 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:46.395279 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:46.895987 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:47.401053 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:47.906086 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:48.407472 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:48.907878 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:49.412921 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:49.917603 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:50.422772 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:50.924161 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:51.429223 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:51.931940 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:52.436974 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:52.942010 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:53.446608 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:53.951632 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:54.453386 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:54.958456 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:55.463502 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:55.968541 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:56.473588 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:56.977096 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:57.482166 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:57.987047 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:58.491196 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:58.996249 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:59.500872 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:17:59.599204 UTC 4863919 DEBUG [makeHeaderString] send stream with transfer-encoding chunked - HttpResponseImpl.cc:535 [1707484679] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 879][ launch_slot_with_data] slot 0 is processing [task id: 32]
2024-02-09T13:17:59.603Z [NITRO]::Debug: [1707484679] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1734][ update_slots] slot 0 : kv cache rm - [0, end)
2024-02-09T13:18:40.371Z [NITRO]::Debug: 20240209 13:18:00.005919 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:00.507410 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:01.009636 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:01.512163 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:02.017188 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:02.522217 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:03.027238 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:03.532268 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:04.037291 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:04.542315 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:05.047337 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:05.552359 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:06.056821 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:06.561844 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:07.066860 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:07.571889 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:08.074731 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:08.579745 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:09.084765 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:09.589782 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:10.094794 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:10.599825 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:11.104858 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:11.609886 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:12.114899 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:12.619924 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:13.121840 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:13.622485 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:14.127501 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:14.628499 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:15.133515 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:15.638544 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:16.138853 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:16.643874 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:17.148889 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:17.653907 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:18.158933 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:18.663953 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:19.168965 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:19.673987 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:20.179001 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:20.684024 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:21.189045 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:21.695058 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:22.200071 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:22.705083 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:23.210090 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:23.715106 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:24.224267 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:24.729286 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:25.234311 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:25.735299 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:26.240311 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:26.745337 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:27.250358 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:27.755375 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:28.260384 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:28.765399 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:29.270419 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:29.770908 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:30.275935 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:30.780958 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:31.285974 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:31.790991 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:32.296235 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:32.801251 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:33.306266 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:33.811286 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:34.316307 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:34.821328 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:35.328240 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:35.833254 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:36.338265 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:36.843284 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:37.348310 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:37.849682 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:38.351013 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:38.856027 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:39.361039 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:39.866064 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:40.371076 UTC 486 2024-02-09T13:18:47.864Z [NITRO]::Debug: 3918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:40.876103 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:41.381114 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:41.885570 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:42.390588 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:42.895602 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:43.398947 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:43.903968 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:44.408996 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:44.914019 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:45.419033 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:45.924052 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:46.429072 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:46.934093 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:18:47.439101 UTC 4863918 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 [1707484727] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 472][ print_timings] [1707484727] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 477][ print_timings] print_timings: prompt eval time = 13202.90 ms / 2080 tokens ( 6.35 ms per token, 157.54 tokens per second) [1707484727] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 482][ print_timings] print_timings: eval time = 35062.22 ms / 246 runs ( 142.53 ms per token, 7.02 tokens per second) [1707484727] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 484][ print_timings] print_timings: total time = 48265.12 ms [1707484727] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (2327 tokens in cache)
2024-02-09T13:18:47.864Z [NITRO]::Debug: 20240209 13:18:47.864553 UTC 4863919 INFO reached result stop - llamaCPP.cc:354 20240209 13:18:47.864580 UTC 4863919 INFO Connection closed or buffer is null. Reset context - llamaCPP.cc:318 [1707484727] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (2327 tokens in cache) [1707484727] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 879][ launch_slot_with_data] slot 0 is processing [task id: 34] [1707484727] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1462][ process_tasks] slot unavailable
2024-02-09T13:18:47.867Z [NITRO]::Debug: [1707484727] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1734][ update_slots] slot 0 : kv cache rm - [0, end)
2024-02-09T13:18:59.668Z [NITRO]::Debug: 20240209 13:18:47.944126 UTC 4863918 INFO Failing retrying now - llamaCPP.cc:375 20240209 13:18:47.944178 UTC 4863918 INFO Connection closed or buffer is null. Reset context - llamaCPP.cc:318 [1707484739] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1462][ process_tasks] slot unavailable
2024-02-09T13:19:10.899Z [NITRO]::Debug: [1707484750] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 472][ print_timings] [1707484750] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 477][ print_timings] print_timings: prompt eval time = 11845.05 ms / 2080 tokens ( 5.69 ms per token, 175.60 tokens per second) [1707484750] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 482][ print_timings] print_timings: eval time = 11189.24 ms / 244 runs ( 45.86 ms per token, 21.81 tokens per second) [1707484750] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 484][ print_timings] print_timings: total time = 23034.29 ms [1707484750] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (2325 tokens in cache)
2024-02-09T13:20:05.525Z [NITRO]::Debug: Request to kill Nitro 2024-02-09T13:20:05.610Z [NITRO]::Error: ggml_metal_free: deallocating
2024-02-09T13:20:05.668Z [NITRO]::Error: warning: failed to munlock buffer: Cannot allocate memory
2024-02-09T13:20:09.434Z [NITRO]::Debug: 20240209 13:20:05.526621 UTC 4863920 INFO Program is exitting, goodbye! - processManager.cc:8 20240209 13:20:05.526671 UTC 4863920 INFO changed to false - llamaCPP.cc:624 20240209 13:20:05.526678 UTC 4865063 INFO Background task stopped! - llamaCPP.cc:614 20240209 13:20:05.526861 UTC 4865063 INFO KV cache cleared! - llamaCPP.cc:616
2024-02-09T13:20:09.441Z [NITRO]::Debug: Nitro process is terminated 2024-02-09T13:20:09.441Z [NITRO]::Debug: Nitro exited with code: 0 `
If this isn't the same issue, I will make a separate one.
Also, let me know if you need anymore information.
I did a retry with Stream turned of. It appears to run consistent then. After a few questions, I turned Stream on. The first response was in the 20+T/s, but the one after become much slower and one time had to regenerate a response again. Maybe it has something to do with the streaming? Here's the log:
`2024-02-09T13:44:01.480Z [NITRO]::CPU informations - 16 2024-02-09T13:44:01.480Z [NITRO]::Debug: Request to kill Nitro 2024-02-09T13:44:01.481Z [NITRO]::Debug: Nitro process is terminated 2024-02-09T13:44:01.481Z [NITRO]::Debug: Spawning Nitro subprocess... 2024-02-09T13:44:01.481Z [NITRO]::Debug: Spawn nitro at path: /Users/vincent/jan/extensions/@janhq/inference-nitro-extension/dist/bin/mac-arm64/nitro, and args: 1,127.0.0.1,3928 2024-02-09T13:44:01.792Z [NITRO]::Debug: Nitro is ready 2024-02-09T13:44:01.792Z [NITRO]::Debug: Loading model with params {"ctx_len":32000,"prompt_template":"[INST] {prompt} [/INST]","llama_model_path":"/Users/vincent/jan/models/mixtral-8x7b-instruct/mixtral-8x7b-instruct-v0.1.Q6_K.gguf","ngl":256,"mlock":true,"user_prompt":"[INST] ","ai_prompt":" [/INST]","cpu_threads":16} 2024-02-09T13:44:01.797Z [NITRO]::Debug: [93m [94m [93m [94m [93m [94m [93m_[94m_[93m_[94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m_[94m_[93m_[94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m_[94m_[93m_[94m [93m [94m [93m [94m [0m [93m [94m [93m [94m [93m [94m/[93m_[94m_[93m/[94m [93m [94m [93m [94m [93m [94m [93m [94m_[93m_[94m_[93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m_[93m_[94m_[93m [94m [93m [94m [93m [94m [93m [94m [93m/[94m [93m [94m/[93m\[94m [93m [94m [93m [94m [93m [94m [93m [94m [93m/[94m [93m [94m/[93m\[94m [93m [94m [93m [0m [93m [94m [93m [94m [93m [94m\[93m [94m [93m\[94m:[93m\[94m [93m [94m [93m [94m [93m [94m/[93m [94m [93m/[94m\[93m [94m [93m [94m [93m [94m [93m [94m [93m [94m/[93m [94m [93m/[94m\[93m [94m [93m [94m [93m [94m [93m/[94m [93m [94m/[93m:[94m:[93m\[94m [93m [94m [93m [94m [93m [94m [93m/[94m [93m [94m/[93m:[94m:[93m\[94m [93m [94m [0m [93m [94m [93m [94m [93m [94m [93m\[94m [93m [94m\[93m:[94m\[93m [94m [93m [94m [93m/[94m [93m [94m/[93m:[94m/[93m [94m [93m [94m [93m [94m [93m [94m [93m/[94m [93m [94m/[93m:[94m/[93m [94m [93m [94m [93m [94m/[93m [94m [93m/[94m:[93m/[94m\[93m:[94m\[93m [94m [93m [94m [93m [94m/[93m [94m [93m/[94m:[93m/[94m\[93m:[94m\[93m [94m [0m [93m [94m [93m_[94m_[93m_[94m_[93m_[94m\[93m_[94m_[93m\[94m:[93m\[94m [93m [94m/[93m_[94m_[93m/[94m:[93m:[94m\[93m [94m [93m [94m [93m [94m [93m [94m/[93m [94m [93m/[94m:[93m/[94m [93m [94m [93m [94m [93m/[94m [93m [94m/[93m:[94m/[93m [94m [93m\[94m:[93m\[94m [93m [94m [93m/[94m [93m [94m/[93m:[94m/[93m [94m [93m\[94m:[93m\[94m [0m [93m [94m/[93m_[94m_[93m/[94m:[93m:[94m:[93m:[94m:[93m:[94m:[93m:[94m\[93m [94m\[93m_[94m_[93m\[94m/[93m\[94m:[93m\[94m_[93m_[94m [93m [94m [93m/[94m [93m [94m/[93m:[94m:[93m\[94m [93m [94m [93m [94m/[93m_[94m_[93m/[94m:[93m/[94m [93m/[94m:[93m/[94m_[93m_[94m_[93m [94m/[93m_[94m_[93m/[94m:[93m/[94m [93m\[94m_[93m_[94m\[93m:[94m\[0m [93m [94m\[93m [94m [93m\[94m:[93m\[94m~[93m~[94m\[93m~[94m~[93m\[94m/[93m [94m [93m [94m [93m\[94m [93m [94m\[93m:[94m\[93m/[94m\[93m [94m/[93m_[94m_[93m/[94m:[93m/[94m\[93m:[94m\[93m [94m [93m [94m\[93m [94m [93m\[94m:[93m\[94m/[93m:[94m:[93m:[94m:[93m:[94m/[93m [94m\[93m [94m [93m\[94m:[93m\[94m [93m/[94m [93m [94m/[93m:[94m/[0m [93m [94m [93m\[94m [93m [94m\[93m:[94m\[93m [94m [93m~[94m~[93m~[94m [93m [94m [93m [94m [93m [94m\[93m_[94m_[93m\[94m:[93m:[94m/[93m [94m\[93m_[94m_[93m\[94m/[93m [94m [93m\[94m:[93m\[94m [93m [94m [93m\[94m [93m [94m\[93m:[94m:[93m/[94m~[93m~[94m~[93m~[94m [93m [94m [93m\[94m [93m [94m\[93m:[94m\[93m [94m [93m/[94m:[93m/[94m [0m [93m [94m [93m [94m\[93m [94m [93m\[94m:[93m\[94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m/[93m_[94m_[93m/[94m:[93m/[94m [93m [94m [93m [94m [93m [94m [93m\[94m [93m [94m\[93m:[94m\[93m [94m [93m [94m\[93m [94m [93m\[94m:[93m\[94m [93m [94m [93m [94m [93m [94m [93m [94m\[93m [94m [93m\[94m:[93m\[94m/[93m:[94m/[93m [94m [0m [93m [94m [93m [94m [93m\[94m [93m [94m\[93m:[94m\[93m [94m [93m [94m [93m [94m [93m [94m [93m [94m\[93m_[94m_[93m\[94m/[93m [94m [93m [94m [93m [94m [93m [94m [93m [94m\[93m_[94m_[93m\[94m/[93m [94m [93m [94m [93m\[94m [93m [94m\[93m:[94m\[93m [94m [93m [94m [93m [94m [93m [94m [93m\[94m [93m [94m\[93m:[94m:[93m/[94m [93m [94m [0m [93m [94m [93m [94m [93m [94m\[93m_[94m_[93m\[94m/[93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m [93m [94m\[93m_[94m_[93m\[94m/[93m [94m [93m [94m [93m [94m [93m [94m [93m [94m\[93m_[94m_[93m\[94m/[93m [94m [93m [94m [0m [0m20240209 13:44:01.506833 UTC 5384803 INFO Nitro version: - main.cc:50 20240209 13:44:01.507064 UTC 5384803 INFO Server started, listening at: 127.0.0.1:3928 - main.cc:54 20240209 13:44:01.507064 UTC 5384803 INFO Please load your model - main.cc:55 20240209 13:44:01.507066 UTC 5384803 INFO Number of thread is:16 - main.cc:62 {"timestamp":1707486241,"level":"INFO","function":"loadModelImpl","line":561,"message":"system info","n_threads":16,"total_threads":16,"system_info":"AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | "}
2024-02-09T13:44:01.801Z [NITRO]::Error: llama_model_loader: loaded meta data with 26 key-value pairs and 995 tensors from /Users/vincent/jan/models/mixtral-8x7b-instruct/mixtral-8x7b-instruct-v0.1.Q6_K.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = mistralai_mixtral-8x7b-instruct-v0.1 llama_model_loader: - kv 2: llama.context_length u32 = 32768 llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 llama_model_loader: - kv 4: llama.block_count u32 = 32 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 9: llama.expert_count u32 = 8 llama_model_loader: - kv 10: llama.expert_used_count u32 = 2 llama_model_loader: - kv 11: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 12: llama.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 13: general.file_type u32 = 18 llama_model_loader: - kv 14: tokenizer.ggml.model str = llama
2024-02-09T13:44:01.804Z [NITRO]::Error: llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,32000] = ["", "", "<0x00>", "<...
2024-02-09T13:44:01.810Z [NITRO]::Error: llama_model_loader: - kv 16: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
2024-02-09T13:44:01.810Z [NITRO]::Error: llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 20: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 22: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 23: tokenizer.ggml.add_eos_token bool = false llama_model_loader: - kv 24: tokenizer.chat_template str = {{ bos_token }}{% for message in mess... llama_model_loader: - kv 25: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type f16: 32 tensors llama_model_loader: - type q8_0: 64 tensors llama_model_loader: - type q6_K: 834 tensors
2024-02-09T13:44:01.821Z [NITRO]::Error: llm_load_vocab: special tokens definition check successful ( 259/32000 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 32768 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 8 llm_load_print_meta: n_expert_used = 2 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 1000000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 32768 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: model type = 7B
2024-02-09T13:44:01.821Z [NITRO]::Error: llm_load_print_meta: model ftype = Q6_K
llm_load_print_meta: model params = 46.70 B
llm_load_print_meta: model size = 35.74 GiB (6.57 BPW)
llm_load_print_meta: general.name = mistralai_mixtral-8x7b-instruct-v0.1
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 ''
llm_load_print_meta: UNK token = 0 '
2024-02-09T13:44:01.821Z [NITRO]::Error: llm_load_tensors: ggml ctx size = 0.76 MiB
2024-02-09T13:44:01.868Z [NITRO]::Error: ggml_backend_metal_buffer_from_ptr: allocated buffer, size = 36497.56 MiB, (36497.64 / 98304.00) llm_load_tensors: offloading 32 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 33/33 layers to GPU llm_load_tensors: CPU buffer size = 102.54 MiB llm_load_tensors: Metal buffer size = 36497.56 MiB
2024-02-09T13:44:01.880Z [NITRO]::Error: . 2024-02-09T13:44:01.891Z [NITRO]::Error: . 2024-02-09T13:44:01.902Z [NITRO]::Error: . 2024-02-09T13:44:01.913Z [NITRO]::Error: . 2024-02-09T13:44:01.924Z [NITRO]::Error: . 2024-02-09T13:44:01.937Z [NITRO]::Error: . 2024-02-09T13:44:01.947Z [NITRO]::Error: . 2024-02-09T13:44:01.956Z [NITRO]::Error: . 2024-02-09T13:44:01.966Z [NITRO]::Error: . 2024-02-09T13:44:01.978Z [NITRO]::Error: . 2024-02-09T13:44:01.988Z [NITRO]::Error: . 2024-02-09T13:44:01.998Z [NITRO]::Error: . 2024-02-09T13:44:02.008Z [NITRO]::Error: . 2024-02-09T13:44:02.019Z [NITRO]::Error: . 2024-02-09T13:44:02.030Z [NITRO]::Error: . 2024-02-09T13:44:02.040Z [NITRO]::Error: . 2024-02-09T13:44:02.051Z [NITRO]::Error: . 2024-02-09T13:44:02.062Z [NITRO]::Error: . 2024-02-09T13:44:02.072Z [NITRO]::Error: . 2024-02-09T13:44:02.083Z [NITRO]::Error: . 2024-02-09T13:44:02.093Z [NITRO]::Error: . 2024-02-09T13:44:02.103Z [NITRO]::Error: . 2024-02-09T13:44:02.114Z [NITRO]::Error: . 2024-02-09T13:44:02.124Z [NITRO]::Error: . 2024-02-09T13:44:02.134Z [NITRO]::Error: . 2024-02-09T13:44:02.144Z [NITRO]::Error: . 2024-02-09T13:44:02.155Z [NITRO]::Error: . 2024-02-09T13:44:02.165Z [NITRO]::Error: . 2024-02-09T13:44:02.175Z [NITRO]::Error: . 2024-02-09T13:44:02.186Z [NITRO]::Error: . 2024-02-09T13:44:02.196Z [NITRO]::Error: . 2024-02-09T13:44:02.205Z [NITRO]::Error: . 2024-02-09T13:44:02.216Z [NITRO]::Error: . 2024-02-09T13:44:02.226Z [NITRO]::Error: . 2024-02-09T13:44:02.236Z [NITRO]::Error: . 2024-02-09T13:44:02.246Z [NITRO]::Error: . 2024-02-09T13:44:02.255Z [NITRO]::Error: . 2024-02-09T13:44:02.266Z [NITRO]::Error: . 2024-02-09T13:44:02.276Z [NITRO]::Error: . 2024-02-09T13:44:02.286Z [NITRO]::Error: . 2024-02-09T13:44:02.300Z [NITRO]::Error: . 2024-02-09T13:44:02.310Z [NITRO]::Error: . 2024-02-09T13:44:02.320Z [NITRO]::Error: . 2024-02-09T13:44:02.330Z [NITRO]::Error: . 2024-02-09T13:44:02.340Z [NITRO]::Error: . 2024-02-09T13:44:02.350Z [NITRO]::Error: . 2024-02-09T13:44:02.360Z [NITRO]::Error: . 2024-02-09T13:44:02.371Z [NITRO]::Error: . 2024-02-09T13:44:02.382Z [NITRO]::Error: . 2024-02-09T13:44:02.392Z [NITRO]::Error: . 2024-02-09T13:44:02.401Z [NITRO]::Error: . 2024-02-09T13:44:02.412Z [NITRO]::Error: . 2024-02-09T13:44:02.423Z [NITRO]::Error: . 2024-02-09T13:44:02.434Z [NITRO]::Error: . 2024-02-09T13:44:02.444Z [NITRO]::Error: . 2024-02-09T13:44:02.454Z [NITRO]::Error: . 2024-02-09T13:44:02.464Z [NITRO]::Error: . 2024-02-09T13:44:02.474Z [NITRO]::Error: . 2024-02-09T13:44:02.484Z [NITRO]::Error: . 2024-02-09T13:44:02.494Z [NITRO]::Error: . 2024-02-09T13:44:02.504Z [NITRO]::Error: . 2024-02-09T13:44:02.514Z [NITRO]::Error: . 2024-02-09T13:44:02.524Z [NITRO]::Error: . 2024-02-09T13:44:02.533Z [NITRO]::Error: . 2024-02-09T13:44:02.543Z [NITRO]::Error: . 2024-02-09T13:44:02.553Z [NITRO]::Error: . 2024-02-09T13:44:02.563Z [NITRO]::Error: . 2024-02-09T13:44:02.573Z [NITRO]::Error: . 2024-02-09T13:44:02.584Z [NITRO]::Error: . 2024-02-09T13:44:02.594Z [NITRO]::Error: . 2024-02-09T13:44:02.603Z [NITRO]::Error: . 2024-02-09T13:44:02.612Z [NITRO]::Error: . 2024-02-09T13:44:02.622Z [NITRO]::Error: . 2024-02-09T13:44:02.631Z [NITRO]::Error: . 2024-02-09T13:44:02.642Z [NITRO]::Error: . 2024-02-09T13:44:02.652Z [NITRO]::Error: . 2024-02-09T13:44:02.662Z [NITRO]::Error: . 2024-02-09T13:44:02.672Z [NITRO]::Error: . 2024-02-09T13:44:02.681Z [NITRO]::Error: . 2024-02-09T13:44:02.691Z [NITRO]::Error: . 2024-02-09T13:44:02.701Z [NITRO]::Error: . 2024-02-09T13:44:02.710Z [NITRO]::Error: . 2024-02-09T13:44:02.720Z [NITRO]::Error: . 2024-02-09T13:44:02.730Z [NITRO]::Error: . 2024-02-09T13:44:02.740Z [NITRO]::Error: . 2024-02-09T13:44:02.753Z [NITRO]::Error: . 2024-02-09T13:44:02.765Z [NITRO]::Error: . 2024-02-09T13:44:02.777Z [NITRO]::Error: . 2024-02-09T13:44:02.790Z [NITRO]::Error: . 2024-02-09T13:44:02.803Z [NITRO]::Error: . 2024-02-09T13:44:02.815Z [NITRO]::Error: . 2024-02-09T13:44:02.828Z [NITRO]::Error: . 2024-02-09T13:44:02.841Z [NITRO]::Error: . 2024-02-09T13:44:02.852Z [NITRO]::Error: . 2024-02-09T13:44:02.862Z [NITRO]::Error: . 2024-02-09T13:44:02.873Z [NITRO]::Error: . 2024-02-09T13:44:02.885Z [NITRO]::Error: . 2024-02-09T13:44:02.898Z [NITRO]::Error: . 2024-02-09T13:44:02.910Z [NITRO]::Error: . 2024-02-09T13:44:02.922Z [NITRO]::Error: .
2024-02-09T13:44:02.923Z [NITRO]::Error: llama_new_context_with_model: n_ctx = 32000 llama_new_context_with_model: freq_base = 1000000.0 llama_new_context_with_model: freq_scale = 1 ggml_metal_init: allocating
2024-02-09T13:44:02.923Z [NITRO]::Error: ggml_metal_init: found device: Apple M3 Max
2024-02-09T13:44:02.923Z [NITRO]::Error: ggml_metal_init: picking default device: Apple M3 Max
2024-02-09T13:44:02.925Z [NITRO]::Error: ggml_metal_init: default.metallib not found, loading from source
2024-02-09T13:44:02.925Z [NITRO]::Error: ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil ggml_metal_init: loading '/Users/vincent/jan/extensions/@janhq/inference-nitro-extension/dist/bin/mac-arm64/ggml-metal.metal'
2024-02-09T13:44:02.927Z [NITRO]::Error: ggml_metal_init: GPU name: Apple M3 Max ggml_metal_init: GPU family: MTLGPUFamilyApple9 (1009) ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_init: simdgroup reduction support = true ggml_metal_init: simdgroup matrix mul. support = true ggml_metal_init: hasUnifiedMemory = true ggml_metal_init: recommendedMaxWorkingSetSize = 103079.22 MB
2024-02-09T13:44:02.932Z [NITRO]::Error: ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 4000.00 MiB, (40503.52 / 98304.00)
2024-02-09T13:44:03.186Z [NITRO]::Error: llama_kv_cache_init: Metal KV buffer size = 4000.00 MiB llama_new_context_with_model: KV self size = 4000.00 MiB, K (f16): 2000.00 MiB, V (f16): 2000.00 MiB llama_new_context_with_model: CPU input buffer size = 70.63 MiB
2024-02-09T13:44:03.186Z [NITRO]::Error: ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 0.02 MiB, (40503.53 / 98304.00)
2024-02-09T13:44:03.188Z [NITRO]::Error: ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 2326.00 MiB, (42829.52 / 98304.00) llama_new_context_with_model: Metal compute buffer size = 2325.99 MiB
2024-02-09T13:44:03.188Z [NITRO]::Error: llama_new_context_with_model: CPU compute buffer size = 8.80 MiB llama_new_context_with_model: graph splits (measure): 3
2024-02-09T13:44:04.580Z [NITRO]::Debug: [1707486244] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 593][ initialize] Available slots: [1707486244] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 601][ initialize] -> Slot 0 - max context: 32000
2024-02-09T13:44:04.581Z [NITRO]::Debug: 20240209 13:44:04.581146 UTC 5384806 INFO Started background task here! - llamaCPP.cc:572 [1707486244] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1541][ update_slots] all slots are idle and system prompt is empty, clear the KV cache
2024-02-09T13:44:04.581Z [NITRO]::Debug: [1707486244] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 879][ launch_slot_with_data] slot 0 is processing [task id: 0]
2024-02-09T13:44:04.581Z [NITRO]::Debug: [1707486244] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1734][ update_slots] slot 0 : kv cache rm - [0, end)
2024-02-09T13:44:04.815Z [NITRO]::Debug: [1707486244] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 472][ print_timings] [1707486244] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 477][ print_timings] print_timings: prompt eval time = 126.52 ms / 2 tokens ( 63.26 ms per token, 15.81 tokens per second) [1707486244] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 482][ print_timings] print_timings: eval time = 107.39 ms / 4 runs ( 26.85 ms per token, 37.25 tokens per second) [1707486244] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 484][ print_timings] print_timings: total time = 233.91 ms
2024-02-09T13:44:04.815Z [NITRO]::Debug: [1707486244] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (7 tokens in cache)
2024-02-09T13:44:04.817Z [NITRO]::Debug: Load model success with response {} 2024-02-09T13:44:04.818Z [NITRO]::Debug: Validate model state with response 200 2024-02-09T13:44:04.818Z [NITRO]::Debug: Validate model state success with response {"model_data":"{"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"/Users/vincent/jan/models/mixtral-8x7b-instruct/mixtral-8x7b-instruct-v0.1.Q6_K.gguf","n_ctx":32000,"n_keep":0,"n_predict":2,"n_probs":0,"penalize_nl":true,"penalty_prompt_tokens":[],"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.100000023841858,"seed":4294967295,"stop":[],"stream":false,"temperature":0.800000011920929,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0,"use_penalty_prompt_tokens":false}","model_loaded":true} 2024-02-09T13:44:04.919Z [NITRO]::Debug: 20240209 13:44:04.815238 UTC 5384806 INFO {"content":"! My name is L","generation_settings":{"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"/Users/vincent/jan/models/mixtral-8x7b-instruct/mixtral-8x7b-instruct-v0.1.Q6_K.gguf","n_ctx":32000,"n_keep":0,"n_predict":2,"n_probs":0,"penalize_nl":true,"penalty_prompt_tokens":[],"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.100000023841858,"seed":4294967295,"stop":[],"stream":false,"temperature":0.800000011920929,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0,"use_penalty_prompt_tokens":false},"model":"/Users/vincent/jan/models/mixtral-8x7b-instruct/mixtral-8x7b-instruct-v0.1.Q6_K.gguf","prompt":"Hello","slot_id":0,"stop":true,"stopped_eos":false,"stopped_limit":true,"stopped_word":false,"stopping_word":"","timings":{"predicted_ms":107.391,"predicted_n":4,"predicted_per_second":37.24706912124852,"predicted_per_token_ms":26.84775,"prompt_ms":126.519,"prompt_n":2,"prompt_per_second":15.80790237039496,"prompt_per_token_ms":63.2595},"tokens_cached":6,"tokens_evaluated":2,"tokens_predicted":4,"truncated":false} - llamaCPP.cc:133 20240209 13:44:04.919230 UTC 5384808 INFO sent the non stream, waiting for respone - llamaCPP.cc:391 [1707486244] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 879][ launch_slot_with_data] slot 0 is processing [task id: 1]
2024-02-09T13:44:04.919Z [NITRO]::Debug: [1707486244] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1734][ update_slots] slot 0 : kv cache rm - [0, end)
2024-02-09T13:44:42.699Z [NITRO]::Debug: [1707486282] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 472][ print_timings] [1707486282] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 477][ print_timings] print_timings: prompt eval time = 831.60 ms / 110 tokens ( 7.56 ms per token, 132.27 tokens per second) [1707486282] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 482][ print_timings] print_timings: eval time = 36948.09 ms / 905 runs ( 40.83 ms per token, 24.49 tokens per second) [1707486282] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 484][ print_timings] print_timings: total time = 37779.69 ms [1707486282] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (1016 tokens in cache)
2024-02-09T13:45:11.094Z [NITRO]::Debug: 20240209 13:44:42.699059 UTC 5384808 INFO Here is the result:0 - llamaCPP.cc:395 20240209 13:45:11.094793 UTC 5384808 INFO sent the non stream, waiting for respone - llamaCPP.cc:391 [1707486311] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 879][ launch_slot_with_data] slot 0 is processing [task id: 2]
2024-02-09T13:45:11.096Z [NITRO]::Debug: [1707486311] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1734][ update_slots] slot 0 : kv cache rm - [0, end)
2024-02-09T13:45:53.544Z [NITRO]::Debug: [1707486353] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 472][ print_timings] [1707486353] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 477][ print_timings] print_timings: prompt eval time = 7465.56 ms / 1044 tokens ( 7.15 ms per token, 139.84 tokens per second) [1707486353] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 482][ print_timings] print_timings: eval time = 34983.97 ms / 806 runs ( 43.40 ms per token, 23.04 tokens per second) [1707486353] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 484][ print_timings] print_timings: total time = 42449.53 ms
2024-02-09T13:45:53.544Z [NITRO]::Debug: [1707486353] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (1851 tokens in cache)
2024-02-09T13:46:10.858Z [NITRO]::Debug: 20240209 13:45:53.544453 UTC 5384808 INFO Here is the result:0 - llamaCPP.cc:395 20240209 13:46:10.857951 UTC 5384808 DEBUG [makeHeaderString] send stream with transfer-encoding chunked - HttpResponseImpl.cc:535 [1707486370] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 879][ launch_slot_with_data] slot 0 is processing [task id: 3]
2024-02-09T13:46:10.861Z [NITRO]::Debug: [1707486370] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1734][ update_slots] slot 0 : kv cache rm - [0, end)
2024-02-09T13:46:59.078Z [NITRO]::Debug: [1707486419] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 472][ print_timings] [1707486419] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 477][ print_timings] print_timings: prompt eval time = 11584.06 ms / 1889 tokens ( 6.13 ms per token, 163.07 tokens per second) [1707486419] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 482][ print_timings] print_timings: eval time = 36636.16 ms / 745 runs ( 49.18 ms per token, 20.34 tokens per second) [1707486419] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 484][ print_timings] print_timings: total time = 48220.22 ms
2024-02-09T13:46:59.078Z [NITRO]::Debug: [1707486419] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (2635 tokens in cache) 20240209 13:46:59.078352 UTC 5384808 INFO reached result stop - llamaCPP.cc:354 [1707486419] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (2635 tokens in cache) 20240209 13:46:59.078379 UTC 5384808 INFO Connection closed or buffer is null. Reset context - llamaCPP.cc:318 [1707486419] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 879][ launch_slot_with_data] slot 0 is processing [task id: 5] [1707486419] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1462][ process_tasks] slot unavailable
2024-02-09T13:46:59.080Z [NITRO]::Debug: [1707486419] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1734][ update_slots] slot 0 : kv cache rm - [0, end)
2024-02-09T13:47:12.268Z [NITRO]::Debug: 20240209 13:47:12.222950 UTC 5384808 DEBUG [makeHeaderString] send stream with transfer-encoding chunked - HttpResponseImpl.cc:535 [1707486432] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1462][ process_tasks] slot unavailable
2024-02-09T13:47:41.660Z [NITRO]::Debug: 20240209 13:47:12.285145 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:12.789923 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:13.294946 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:13.797941 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:14.302969 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:14.804593 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:15.309614 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:15.823856 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:16.328869 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:16.833892 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:17.338901 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:17.843922 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:18.358483 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:18.863505 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:19.368368 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:19.875081 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:20.379918 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:20.892308 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:21.396406 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:21.901265 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:22.405133 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:22.910005 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:23.414887 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:23.916923 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:24.421801 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:24.926685 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:25.431570 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:25.936468 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:26.439404 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:26.944301 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:27.449226 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:27.954127 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:28.459039 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:28.963953 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:29.467242 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:29.972169 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:30.477093 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:30.982014 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:31.486943 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:31.991896 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:32.495943 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:33.000896 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:33.502970 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:34.007910 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:34.512872 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:35.017813 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:35.522765 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:36.027715 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:36.532679 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:37.037635 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:37.542594 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:38.047552 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:38.551592 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:39.056558 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:39.561523 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:40.066495 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:40.569670 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:41.074638 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:41.579612 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 [1707486461] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 472][ print_timings] [1707486461] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 477][ print_timings] print_timings: prompt eval time = 10625.00 ms / 1889 tokens ( 5.62 ms per token, 177.79 tokens per second) [1707486461] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 482][ print_timings] print_timings: eval time = 31956.69 ms / 713 runs ( 44.82 ms per token, 22.31 tokens per second) [1707486461] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 484][ print_timings] print_timings: total time = 42581.69 ms
2024-02-09T13:47:41.660Z [NITRO]::Debug: [1707486461] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (2603 tokens in cache)
2024-02-09T13:47:54.899Z [NITRO]::Debug: 20240209 13:47:42.084668 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:42.589684 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:43.094549 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:43.599531 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:44.104409 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:44.608952 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:45.113967 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:45.618964 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:46.123965 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:46.628893 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:47.129768 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:47.629839 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:48.133355 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:48.637222 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:49.142385 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:49.644803 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:50.148749 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:50.650288 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:51.150487 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:51.655486 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:52.159352 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:52.660738 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:53.165772 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:53.670811 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:54.175817 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:54.680843 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:54.899381 UTC 5384809 INFO Clean cache threshold reached! - llamaCPP.cc:169 20240209 13:47:54.899631 UTC 5384809 INFO Cache cleaned - llamaCPP.cc:171 20240209 13:47:54.899654 UTC 5384809 DEBUG [makeHeaderString] send stream with transfer-encoding chunked - HttpResponseImpl.cc:535 [1707486474] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 879][ launch_slot_with_data] slot 0 is processing [task id: 9]
2024-02-09T13:47:54.902Z [NITRO]::Debug: [1707486474] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1734][ update_slots] slot 0 : kv cache rm - [0, end)
2024-02-09T13:48:34.391Z [NITRO]::Debug: 20240209 13:47:55.182463 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:55.687462 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:56.192459 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:56.697457 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:57.202468 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:57.707471 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:58.212464 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:58.717464 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:59.222476 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:47:59.727502 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:00.232514 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:00.737518 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:01.242516 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:01.747523 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:02.252524 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:02.757524 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:03.260551 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:03.765565 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:04.270577 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:04.774685 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:05.279686 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:05.784692 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:06.289702 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:06.794709 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:07.299710 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:07.804720 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:08.309732 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:08.814748 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:09.319752 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:09.824770 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:10.329788 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:10.831068 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:11.332915 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:11.833207 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:12.336220 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:12.841246 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:13.346265 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:13.850053 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:14.362349 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:14.867358 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:15.372371 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:15.877404 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:16.382419 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:16.887202 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:17.392213 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:17.897236 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:18.402249 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:18.907263 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:19.412281 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:19.915575 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:20.420589 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:20.925610 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:21.430626 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:21.935647 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:22.440657 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:22.945667 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:23.449993 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:23.955010 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:24.460031 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:24.971525 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:25.476540 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:25.981561 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:26.486577 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:26.993083 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:27.498095 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:28.007838 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:28.512851 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:29.017867 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:29.522907 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:30.025641 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:30.530669 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:31.035675 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:31.540697 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:32.045713 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:32.550730 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:33.055198 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:33.555506 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:34.060522 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 [1707486514] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 472][ print_timings]
2024-02-09T13:48:34.391Z [NITRO]::Debug: [1707486514] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 477][ print_timings] print_timings: prompt eval time = 16150.85 ms / 2667 tokens ( 6.06 ms per token, 165.13 tokens per second) [1707486514] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 482][ print_timings] print_timings: eval time = 23340.90 ms / 216 runs ( 108.06 ms per token, 9.25 tokens per second) [1707486514] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 484][ print_timings] print_timings: total time = 39491.75 ms
2024-02-09T13:48:34.413Z [NITRO]::Debug: [1707486514] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (2884 tokens in cache)
2024-02-09T13:48:34.413Z [NITRO]::Debug: 20240209 13:48:34.412883 UTC 5384809 INFO reached result stop - llamaCPP.cc:354 20240209 13:48:34.412938 UTC 5384809 INFO Connection closed or buffer is null. Reset context - llamaCPP.cc:318 [1707486514] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (2884 tokens in cache) [1707486514] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 879][ launch_slot_with_data] slot 0 is processing [task id: 11] [1707486514] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1462][ process_tasks] slot unavailable
2024-02-09T13:48:34.416Z [NITRO]::Debug: [1707486514] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1734][ update_slots] slot 0 : kv cache rm - [0, end)
2024-02-09T13:48:49.496Z [NITRO]::Debug: 20240209 13:48:34.562509 UTC 5384808 INFO Failing retrying now - llamaCPP.cc:375 [1707486529] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1462][ process_tasks] slot unavailable
2024-02-09T13:49:04.873Z [NITRO]::Debug: 20240209 13:48:49.655668 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:50.160679 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:50.665695 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:51.170709 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:51.675736 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:52.180758 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:52.685772 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:53.190806 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:53.695830 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:54.200843 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:54.706839 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:55.211857 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:55.716887 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:56.221908 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:56.726928 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:57.231939 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:57.736966 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:58.241991 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:58.747017 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:59.252030 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:48:59.757069 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:00.262088 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:00.762272 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:01.265046 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:01.770070 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:02.275098 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:02.782286 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:03.287300 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:03.792320 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:04.292500 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:04.795471 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 [1707486544] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 472][ print_timings] [1707486544] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 477][ print_timings] print_timings: prompt eval time = 15292.39 ms / 2667 tokens ( 5.73 ms per token, 174.40 tokens per second) [1707486544] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 482][ print_timings] print_timings: eval time = 15167.85 ms / 324 runs ( 46.81 ms per token, 21.36 tokens per second) [1707486544] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 484][ print_timings] print_timings: total time = 30460.23 ms [1707486544] [/Users/jan/actions-runner/_work/nitro/nitro/controllers/llamaCPP.h: 1597][ update_slots] slot 0 released (2992 tokens in cache)
2024-02-09T13:49:45.564Z [NITRO]::Debug: 20240209 13:49:05.300509 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:05.805308 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:06.305621 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:06.807298 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:07.308346 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:07.809273 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:08.310853 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:08.812156 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:09.314018 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:09.818775 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:10.323820 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:10.827505 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:11.331834 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:11.832008 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:12.337036 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:12.842087 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:13.347124 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:13.851030 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:14.351658 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:14.856736 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:15.357093 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:15.862131 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:16.366567 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:16.870796 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:17.375457 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:17.880415 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:18.385448 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:18.889833 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:19.394376 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:19.899438 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:20.399817 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:20.904879 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:21.406312 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:21.907084 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:22.411722 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:22.916763 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:23.421820 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:23.923185 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:24.425277 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:24.930316 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:25.431555 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:25.932090 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:26.433823 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:26.938339 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:27.443387 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:27.944516 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:28.447867 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:28.952924 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:29.457394 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:29.962449 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:30.463059 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:30.964763 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:31.465248 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:31.969392 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:32.474446 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:32.978285 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:33.479205 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:33.979903 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:34.481513 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:34.983436 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:35.488237 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:35.993279 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:36.497364 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:37.002437 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:37.505909 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:38.010935 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:38.515289 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:39.018941 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:39.524006 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:40.027278 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:40.527917 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:41.030474 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:41.530628 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:42.035692 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:42.539598 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:43.043419 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:43.548496 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:44.053539 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:44.555996 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:45.061070 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:45.563415 UTC 538 2024-02-09T13:50:25.852Z [NITRO]::Debug: 4808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:46.065142 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:46.570176 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:47.075234 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:47.575363 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:48.079288 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:48.581716 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:49.086803 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:49.591854 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:50.094388 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:50.599451 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:51.101996 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:51.607044 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:52.112105 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:52.612175 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:53.117145 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:53.622183 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:54.127249 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:54.630661 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:55.133633 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:55.638662 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:56.143719 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:56.646337 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:57.151433 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:57.654392 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:58.158804 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:58.661947 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:59.163974 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:49:59.667121 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:00.169672 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:00.670902 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:01.175961 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:01.681007 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:02.183784 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:02.686788 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:03.191838 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:03.695066 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:04.200111 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:04.703379 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:05.208436 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:05.713218 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:06.214593 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:06.719683 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:07.223612 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:07.728668 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:08.233706 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:08.738767 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:09.243300 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:09.743492 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:10.246588 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:10.751651 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:11.256656 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:11.760418 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:12.264544 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:12.769188 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:13.269535 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:13.774578 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:14.276767 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:14.781824 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:15.282987 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:15.785164 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:16.289188 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:16.794243 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:17.298971 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:17.799130 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:18.300537 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:18.805578 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:19.308480 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:19.809708 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:20.310739 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:20.813753 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:21.318488 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:21.819810 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:22.324864 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:22.829904 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:23.331844 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:23.835807 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:24.339733 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:24.844784 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:25.347655 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:25.851821 UTC 5384808 INFO Waiting for task to b 2024-02-09T13:51:06.119Z [NITRO]::Debug: e released status:1 - llamaCPP.cc:366 20240209 13:50:26.353926 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:26.858522 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:27.361097 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:27.862452 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:28.367524 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:28.872568 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:29.372697 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:29.877730 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:30.380753 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:30.885807 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:31.389369 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:31.891426 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:32.392584 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:32.896948 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:33.397243 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:33.900106 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:34.402137 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:34.905574 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:35.410606 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:35.914589 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:36.419657 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:36.920821 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:37.421027 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:37.926067 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:38.426538 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:38.928629 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:39.433693 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:39.937713 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:40.442755 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:40.945068 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:41.450048 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:41.955085 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:42.458714 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:42.963792 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:43.468427 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:43.973483 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:44.474210 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:44.974835 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:45.475470 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:45.975635 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:46.478830 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:46.983875 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:47.488957 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:47.992960 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:48.497995 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:48.998369 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:49.503455 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:50.008523 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:50.510507 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:51.015550 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:51.517857 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:52.018146 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:52.519702 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:53.022768 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:53.527823 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:54.027890 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:54.532951 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:55.037984 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:55.541448 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:56.043034 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:56.548132 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:57.051631 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:57.556676 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:58.059720 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:58.564786 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:59.067633 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:50:59.570697 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:51:00.075766 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:51:00.576310 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:51:01.079492 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:51:01.584186 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:51:02.089212 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:51:02.592992 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:51:03.094798 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:51:03.599189 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:51:04.104234 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:51:04.606266 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:51:05.111312 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:51:05.613640 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.cc:366 20240209 13:51:06.118279 UTC 5384808 INFO Waiting for task to be released status:1 - llamaCPP.c`
@vlbosch could you please try our latest nightly build version? I and some other noticed that the issue seems to be fixed and quite stable, at least on macOS, no longer stuck at generating response.
Operating System: MacOS Jan Version: Jan v0.4.6-271 (Nightly Build) Processor: M2 RAM: 32GB
Windows: Updating
@vlbosch could you please try our latest nightly build version? I and some other noticed that the issue seems to be fixed and quite stable, at least on macOS, no longer stuck at generating response.
I just tried it with the latest nightly, 0.4.6-273 and both issues appear to be resolved: the performance appears to be much more consistent. Also, the GPU-usage after completion of the response-generation almost immediately drops to idle, instead of continuing at 99% for a bit. Thanks for the quick fixes guys!
Operating System: Windows Jan Version: Jan v0.4.6-273 (Nightly Build) Processor: Intel(R) Core(TM) i5-10400 CPU @ 2.90GHz 2.90 GHz RAM: 16.0 GB (15.8 GB usable) GPU: NVIDIA GeForce RTX 2060 6GB
Token speeds remain consistent across requests, averaging around 41.5 tokens per second. GPU usage returns to 0% as soon as the request is completed.
I assume this was caused by strange issue with queue-ing? and we got most of that resolved recently, i'd recommend closing this issue @louis-jan
I assume this was caused by strange issue with queue-ing? and we got most of that resolved recently, i'd recommend closing this issue @louis-jan
Agreed, I will keep an eye on this.