lucene-solr LUCENE-8118: Throw exception if DWPT grows beyond it's maximum ram limit

Today if you add documents that cause a single DWPT to grow beyond its maximum ram buffer size we just keep on indexing. If a user misuses our IndexWriter#addDocuments API and provides a very very large iterable we will run into ArrayIndexOutOfBoundsException down the road and abort the IndexWriter. This change is not bulletproof but best effort to ensure that we fail with a better exception message and don't abort the IndexWriter.

Mar 18 '20 12:03 s1monw

fixing exception handling in DW.updateDocuments to handle this exception gracefully (non-aborting).

the problem in this method was that we need to flush the DWPT to disk even if we hit an non-aborting exception. Otherwise the next doc would hit the same exception or we would violate the assumption that we never receive a flush pending DWPT.

One thing that I keep thinking about is what happens if we index documents close to the limit and then the next batch of documents would exceed that limit. That would mean that we reject docs event though we could index them into a different DWPT. Should we at least try to index it again in a fresh DWPT unless the DWPT that rejected it was empty? I mean that would prevent the issue and should be straight forward to implement. I can work on this in a followup.

Mar 19 '20 20:03 s1monw

fixing exception handling in DW.updateDocuments to handle this exception gracefully (non-aborting).

the problem in this method was that we need to flush the DWPT to disk even if we hit an non-aborting exception. Otherwise the next doc would hit the same exception or we would violate the assumption that we never receive a flush pending DWPT.

Thanks for the explanation; make sense.

One thing that I keep thinking about is what happens if we index documents close to the limit and then the next batch of documents would exceed that limit. That would mean that we reject docs event though we could index them into a different DWPT. Should we at least try to index it again in a fresh DWPT unless the DWPT that rejected it was empty? I mean that would prevent the issue and should be straight forward to implement. I can work on this in a followup.

+1, that's an awesome idea! In general when picking among the N free DWPTs when a new document arrives shouldn't we pick the DWPT that's smallest (least .ramBytesUsed())?

Mar 21 '20 10:03 mikemccand

@mikemccand @dnhatn I added the retry logic, added a test and fixed some minor things. It's changed quite a bit I think it warrants another review.

+1, that's an awesome idea! In general when picking among the N free DWPTs when a new document arrives shouldn't we pick the DWPT that's smallest (least .ramBytesUsed())?

I will address picking the smallest DWPT in a separate PR.

Mar 25 '20 12:03 s1monw

@dnhatn that's a good question. One option here would be to fix Lucene's Field to accept a Supplier<Reader> instead of a reader instance. That way we can consume it multiple times and for back-compat we can keep the reader ctor. This is something that we could benefit from in multiple other places too I guess. @mikemccand WDYT?

Mar 27 '20 10:03 s1monw

Another thing we could consider is changing DWPT's postings addressing from int to long so we don't need retry logic.

We could do it, always, increasing the per-unique-term memory cost. Or we could maybe find a way to do it conditionally, when a given DWPT wants to exceed the 2.1 GB limit, but that'd be trickier.

Mar 28 '20 17:03 mikemccand

Another thing we could consider is changing DWPT's postings addressing from int to long so we don't need retry logic.

I will look into this. My biggest fear is that we miss places where we exceed limits and testing will be difficult. Yet, there is no rush with this and I have time to explore fixing IntBlockPool addressing.

Mar 31 '20 11:03 s1monw

lucene-solr lucene-solr copied to clipboard

LUCENE-8118: Throw exception if DWPT grows beyond it's maximum ram limit

lucene-solr
lucene-solr copied to clipboard