LocalAI
LocalAI copied to clipboard
Mode tts, gpt4 or vision frozes from time to time without timeout
LocalAI version: 2.14.0
Environment, CPU architecture, OS, and Version: Linux Ubuntu SMP PREEMPT_DYNAMIC x86_64 x86_64 x86_64 GNU/Linux
90GB RAM 22 vcores nvidia L4 24GB
Describe the bug Requests frozen from time to time My logs are continuously producing:
2024-05-06T14:09:55.116298748Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256
api_1 | 2024-05-06T14:09:55.116344987Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 128
api_1 | 2024-05-06T14:09:55.116358667Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 64
api_1 | 2024-05-06T14:09:55.116369597Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 32
api_1 | 2024-05-06T14:09:55.116380867Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 16
api_1 | 2024-05-06T14:09:55.116394727Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 8
api_1 | 2024-05-06T14:09:55.116405866Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 4
api_1 | 2024-05-06T14:09:55.116433847Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 2
api_1 | 2024-05-06T14:09:55.116445007Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 1
Thank you for reporting this issue, DavidGOrtega. We will investigate and try to reproduce the problem in a controlled environment. In the meantime, if you have any other details or find anything that could help, please let us know. We'll get back to you as soon as we have more information or a resolution to the issue. This is an ongoing experiment by @mudler, and we're here to help improve LocalAI.
This happens when the prompt exceeds the context size and there is no more space for the response - looks like something we could handle on our side and fail cleanly instead.
What's your context size window? Can you share your model config/ setup?
@mudler Im no even using that model as I use mine, and nothing is apparently requesting it. The only thing I did with that model was install it and then delete it after try it. Is that model gpt-4
?
An easy way to hang the system is to make several requests to tts endpoint in a row (in my case no more than three) to generate the speech of a larger text. it hangs and never timeouts.
Tested with piper and bark
I confirm this issue. Use LocalAI (v2.14.0) with Orca2. Here are logs:
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_kv_cache_init: Metal KV buffer size = 1600.00 MiB
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model: KV self size = 1600.00 MiB, K (f16): 800.00 MiB, V (f16): 800.00 MiB
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model: CPU output buffer size = 0.14 MiB
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model: Metal compute buffer size = 204.00 MiB
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model: CPU compute buffer size = 14.01 MiB
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model: graph nodes = 1286
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model: graph splits = 2
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stdout {"timestamp":1715263031,"level":"INFO","function":"initialize","line":502,"message":"initializing slots","n_slots":1}
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stdout {"timestamp":1715263031,"level":"INFO","function":"initialize","line":514,"message":"new slot","slot_id":0,"n_ctx_slot":2048}
5:57PM INF [llama-cpp] Loads OK
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stdout {"timestamp":1715263031,"level":"INFO","function":"launch_slot_with_data","line":887,"message":"slot is processing task","slot_id":0,"task_id":0}
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stdout {"timestamp":1715263031,"level":"INFO","function":"update_slots","line":1787,"message":"kv cache rm [p0, end)","slot_id":0,"task_id":0,"p0":0}
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr Context exhausted. Slot 0 released (0 tokens in cache)
...
...
...
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 128
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 64
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 32
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 16
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 8
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 4
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 2
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 1
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to decode the batch, n_batch = 1, ret = 1
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 128
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 64
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 32
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 16
I also encountered this issue, did anyone find a solution?
9:53PM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:34437): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256
9:53PM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:34437): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 128
9:53PM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:34437): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 64
9:53PM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:34437): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 32
9:53PM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:34437): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 16
9:53PM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:34437): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 8
9:53PM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:34437): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 4
9:53PM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:34437): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 2
9:53PM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:34437): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 1
9:53PM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:34437): stderr update_slots : failed to decode the batch, n_batch = 1, ret = 1