lucene Fix ByteBlockPool integer overflow by implementing buffer limit detection

Problem

ByteBlockPool uses 32KB buffers with an integer offset tracker ( byteOffset). When more than 65,535 buffers are allocated, integer overflow occurs in the byteOffset calculation (byteOffset = bufferUpto * BYTE_BLOCK_SIZE), causing ArithmeticException during indexing of documents with large numbers of tokens.

Root Cause

Each buffer is 32KB (BYTE_BLOCK_SIZE = 32768)
Maximum safe buffer count: Integer.MAX_VALUE / BYTE_BLOCK_SIZE = 65535
When bufferUpto >= 65535, the multiplication overflows

Solution Implement proactive DWPT flushing when buffer count approaches the limit:

Detection: Added isApproachingBufferLimit() method to detect when buffer count approaches the overflow threshold
Propagation: Buffer limit status flows from ByteBlockPool → IndexingChain → DocumentsWriterPerThread → DocumentsWriterFlushControl
Prevention: Force flush DWPT before overflow occurs, similar to existing RAM-based flushing.

Key Changes

Added buffer limit detection in ByteBlockPool
Integrated check into DocumentsWriterFlushControl.doAfterDocument()
Uses threshold of 65,000 to provide safety margin before actual limit of 65,535
Maintains existing performance characteristics while preventing crashes

Oct 12 '25 15:10 ashish159357

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

Oct 12 '25 15:10 github-actions[bot]

When more than 65,535 buffers are allocated, integer overflow occurs in the byteOffset calculation (byteOffset = bufferUpto * BYTE_BLOCK_SIZE), causing ArithmeticException during indexing of documents with large numbers of tokens.

But this is not supported: the limits on IndexWriter are 2GB

Oct 12 '25 15:10 rmuir

maybe AI-generated? The bullet point formatting looks characteristic. Not that that is banned or anything, but it might need additional scrutiny

Oct 12 '25 15:10 msokolov

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

Oct 12 '25 15:10 github-actions[bot]

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

Oct 13 '25 13:10 github-actions[bot]

Hi @rmuir @msokolov , I'm yet to review this PR. But I see your points as the hard limit check should be enough as it accounts for byteBlockPool as well.

For context , I originally created this issue https://github.com/apache/lucene/issues/15152 - where an opensearch user encountered the byteblockpool overflow during recovery.

 message [shard failure, reason [index id[3458764570588151359] origin[LOCAL_TRANSLOG_RECOVERY] seq#[53664468]]], failure [NotSerializableExceptionWrapper[arithmetic_exception: integer overflow]], markAsStale [true]]
NotSerializableExceptionWrapper[arithmetic_exception: integer overflow]
    at java.lang.Math.addExact(Math.java:883)
    at org.apache.lucene.util.ByteBlockPool.nextBuffer(ByteBlockPool.java:199)
    at org.apache.lucene.index.ByteSlicePool.allocKnownSizeSlice(ByteSlicePool.java:118)
    at org.apache.lucene.index.ByteSlicePool.allocSlice(ByteSlicePool.java:98)
    at org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:226)
    at org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:266)
    at org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:86)
    at org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:197)
    at org.apache.lucene.index.TermsHashPerField.positionStreamSlice(TermsHashPerField.java:214)
    at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:202)
    at org.apache.lucene.index.IndexingChain$PerField.invertTokenStream(IndexingChain.java:1287)
    at org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1183)
    at org.apache.lucene.index.IndexingChain.processField(IndexingChain.java:731)
    at org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:609)
    at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:263)
    at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:425)
    at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1558)
    at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1516)
    at org.opensearch.index.engine.InternalEngine.addStaleDocs(InternalEngine.java:1291)
    at org.opensearch.index.engine.InternalEngine.indexIntoLucene(InternalEngine.java:1210)
    at org.opensearch.index.engine.InternalEngine.index(InternalEngine.java:1011)
    at org.opensearch.index.shard.IndexShard.index(IndexShard.java:1226)

I think the check for IndexWriterHardLimit in FlushControl comes after we do DocumentsWriter.updateDocuments where adding many documents could potentially exceed the limit and hit this exception.

Do we need a buffer for writer limits to account for next set of documents ?
Do we need to limit the number of docs that can be passed to this method ?

Oct 13 '25 16:10 bharath-techie

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

Oct 14 '25 02:10 github-actions[bot]

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

Oct 14 '25 02:10 github-actions[bot]

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

Oct 29 '25 00:10 github-actions[bot]