llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Retrieval: Fix Memory Leak in Retrieval Query Handling

Open gtygo opened this issue 6 months ago • 1 comments

  • [x] I have read the contributing guidelines

  • Self-reported review complexity:

    • [X] Low
    • [ ] Medium
    • [ ] High
  • Description This pull request addresses a memory leak issue in the retrieval.cpp file, specifically when continuously accepting query inputs. The problem arises from the llama_batch initialization and clearing process.

  • Problem The llama_batch_init function allocates memory on the heap for the batch. However, the current implementation uses llama_batch_clear to reset the batch size to 0, which does not properly free the allocated heap memory. This results in a continuous increase in memory usage as the process runs.

  • Solution The solution involves ensuring that the allocated memory for llama_batch is properly freed after each query is processed. This prevents the memory leak and stabilizes the memory usage of the process.

  • Changes Replaced llama_batch_clear with llama_batch_free to ensure proper memory deallocation.

gtygo avatar Aug 09 '24 17:08 gtygo