LocalAI Mode tts, gpt4 or vision frozes from time to time without timeout

LocalAI version: 2.14.0

Environment, CPU architecture, OS, and Version: Linux Ubuntu SMP PREEMPT_DYNAMIC x86_64 x86_64 x86_64 GNU/Linux

90GB RAM 22 vcores nvidia L4 24GB

Describe the bug Requests frozen from time to time My logs are continuously producing:

2024-05-06T14:09:55.116298748Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256
api_1  | 2024-05-06T14:09:55.116344987Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 128
api_1  | 2024-05-06T14:09:55.116358667Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 64
api_1  | 2024-05-06T14:09:55.116369597Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 32
api_1  | 2024-05-06T14:09:55.116380867Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 16
api_1  | 2024-05-06T14:09:55.116394727Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 8
api_1  | 2024-05-06T14:09:55.116405866Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 4
api_1  | 2024-05-06T14:09:55.116433847Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 2
api_1  | 2024-05-06T14:09:55.116445007Z 2:09PM DBG GRPC(Llama3-8B-OpenHermes-DPO.Q4_K_M.gguf-127.0.0.1:44249): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 1

May 07 '24 13:05 DavidGOrtega

Thank you for reporting this issue, DavidGOrtega. We will investigate and try to reproduce the problem in a controlled environment. In the meantime, if you have any other details or find anything that could help, please let us know. We'll get back to you as soon as we have more information or a resolution to the issue. This is an ongoing experiment by @mudler, and we're here to help improve LocalAI.

May 07 '24 13:05 localai-bot

This happens when the prompt exceeds the context size and there is no more space for the response - looks like something we could handle on our side and fail cleanly instead.

What's your context size window? Can you share your model config/ setup?

May 07 '24 14:05 mudler

@mudler Im no even using that model as I use mine, and nothing is apparently requesting it. The only thing I did with that model was install it and then delete it after try it. Is that model gpt-4?

May 07 '24 14:05 DavidGOrtega

An easy way to hang the system is to make several requests to tts endpoint in a row (in my case no more than three) to generate the speech of a larger text. it hangs and never timeouts.

Tested with piper and bark

May 07 '24 18:05 DavidGOrtega

I confirm this issue. Use LocalAI (v2.14.0) with Orca2. Here are logs:

5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_kv_cache_init:      Metal KV buffer size =  1600.00 MiB
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model: KV self size  = 1600.00 MiB, K (f16):  800.00 MiB, V (f16):  800.00 MiB
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model:        CPU  output buffer size =     0.14 MiB
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model:      Metal compute buffer size =   204.00 MiB
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model:        CPU compute buffer size =    14.01 MiB
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model: graph nodes  = 1286
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr llama_new_context_with_model: graph splits = 2
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stdout {"timestamp":1715263031,"level":"INFO","function":"initialize","line":502,"message":"initializing slots","n_slots":1}
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stdout {"timestamp":1715263031,"level":"INFO","function":"initialize","line":514,"message":"new slot","slot_id":0,"n_ctx_slot":2048}
5:57PM INF [llama-cpp] Loads OK
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stdout {"timestamp":1715263031,"level":"INFO","function":"launch_slot_with_data","line":887,"message":"slot is processing task","slot_id":0,"task_id":0}
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stdout {"timestamp":1715263031,"level":"INFO","function":"update_slots","line":1787,"message":"kv cache rm [p0, end)","slot_id":0,"task_id":0,"p0":0}
5:57PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:65450): stderr Context exhausted. Slot 0 released (0 tokens in cache)
...
...
...
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 128
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 64
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 32
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 16
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 8
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 4
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 2
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 1
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to decode the batch, n_batch = 1, ret = 1
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 128
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 64
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 32
5:53PM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:64115): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 16

May 09 '24 13:05 netandreus

I also encountered this issue, did anyone find a solution?


9:53PM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:34437): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256
9:53PM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:34437): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 128
9:53PM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:34437): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 64
9:53PM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:34437): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 32
9:53PM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:34437): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 16
9:53PM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:34437): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 8
9:53PM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:34437): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 4
9:53PM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:34437): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 2
9:53PM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:34437): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 1
9:53PM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:34437): stderr update_slots : failed to decode the batch, n_batch = 1, ret = 1

Jun 07 '24 21:06 maxi1134

LocalAI LocalAI copied to clipboard

Mode tts, gpt4 or vision frozes from time to time without timeout

LocalAI
LocalAI copied to clipboard