llama.cpp
llama.cpp copied to clipboard
Retrieval: Fix Memory Leak in Retrieval Query Handling
-
[x] I have read the contributing guidelines
-
Self-reported review complexity:
- [X] Low
- [ ] Medium
- [ ] High
-
Description This pull request addresses a memory leak issue in the
retrieval.cpp
file, specifically when continuously accepting query inputs. The problem arises from thellama_batch
initialization and clearing process. -
Problem The
llama_batch_init
function allocates memory on the heap for the batch. However, the current implementation usesllama_batch_clear
to reset the batch size to0
, which does not properly free the allocated heap memory. This results in a continuous increase in memory usage as the process runs. -
Solution The solution involves ensuring that the allocated memory for
llama_batch
is properly freed after each query is processed. This prevents the memory leak and stabilizes the memory usage of the process. -
Changes Replaced
llama_batch_clear
withllama_batch_free
to ensure proper memory deallocation.