OpenSearch
OpenSearch copied to clipboard
Heap usage reduction in Opensearch
From Lucene's documentation,
Re-use Document and Field instances As of Lucene 2.3 there are new setValue(...) methods that allow you to change the value of a Field. This allows you to re-use a single Field instance across many added documents, which can save substantial GC cost. It's best to create a single Document instance, then add multiple Field instances to it, but hold onto these Field instances and re-use them by changing their values for each added document. For example you might have an idField, bodyField, nameField, storedField1, etc. After the document is added, you then directly change the Field values (idField.setValue(...), etc), and then re-add your Document instance. Note that you cannot re-use a single Field instance within a Document, and, you should not change a Field's value until the Document containing that Field has been added to the index. See Field for details.
Opensearch does not implement this recommendation, leading to huge number of object allocations and hence high GC. The trade-off here is that re-using objects cleanly is introduces code complexity and hence increases the chances of more bugs.
This was originally recommended here: https://github.com/elastic/elasticsearch/issues/31479