django-elasticsearch-dsl
django-elasticsearch-dsl copied to clipboard
Out of memory when populating large dataset
I am trying to re-index more than 100 million documents, which doesn't work due to lack of ram.
Is it possible that the problem is in the Elasticsearch implementation when executing parallel indexing?
Here is an issue where they talk about the memory leak: https://github.com/elastic/elasticsearch-py/issues/1101#issuecomment-586217960
Looks like my memory fills up after this line when using streaming_bulk: https://github.com/elastic/elasticsearch-py/blob/8d10e1545e2572d3ab1e92cfaf0968085145eb4d/elasticsearch/helpers/actions.py#L232