algoliasearch-client-python icon indicating copy to clipboard operation
algoliasearch-client-python copied to clipboard

Increasing memory usage when using replace_all_objects

Open AugPro opened this issue 2 years ago • 0 comments
trafficstars

Hello, I have a memory issue when using replace_all_objects. When using this function with a significant amount of documents (5 Million), I use an iterator to minimize memory consumption. I expect the memory usage to stay flat during the operation, however it keeps increasing. (cf image below) image

Upon investigation, it looks like the cause of this memory usage increase comes from the function SearchIndex._chunk, and more specifically the list raw_responses, which stores responses for every request sent. https://github.com/algolia/algoliasearch-client-python/blob/3bb9108d9dff627f12c921ad23dab02984f70a44/algoliasearch/search_index.py#L505-L528

This is a problem because the response of /1/indexes/{indexName}/batch contains the list of objectIDs

{
  "taskID": 792,
  "objectIDs": ["6891", "6892"]
}

With 5M documents, each with an objectID of ~15 characters, this accounts for 300MB.

>>> sys.getsizeof("123456789012345") * 5_000_000 / (1024**2)
305.17578125

Is there a request_option for the API not to return objectIDs, or for the code not to store them in raw_responses ?

Thank you 🙏

AugPro avatar Jun 07 '23 13:06 AugPro