llama.cpp Retrieval: Fix Memory Leak in Retrieval Query Handling

Retrieval: Fix Memory Leak in Retrieval Query Handling

Open gtygo opened this issue 6 months ago • 1 comments

[x] I have read the contributing guidelines
Self-reported review complexity:
- [X] Low
- [ ] Medium
- [ ] High
Description This pull request addresses a memory leak issue in the retrieval.cpp file, specifically when continuously accepting query inputs. The problem arises from the llama_batch initialization and clearing process.
Problem The llama_batch_init function allocates memory on the heap for the batch. However, the current implementation uses llama_batch_clear to reset the batch size to 0, which does not properly free the allocated heap memory. This results in a continuous increase in memory usage as the process runs.
Solution The solution involves ensuring that the allocated memory for llama_batch is properly freed after each query is processed. This prevents the memory leak and stabilizes the memory usage of the process.
Changes Replaced llama_batch_clear with llama_batch_free to ensure proper memory deallocation.

Aug 09 '24 17:08 gtygo