lucene icon indicating copy to clipboard operation
lucene copied to clipboard

[Bug] Stored fields force merge regression between Lucene 9.12 and Lucene 10.0

Open bharath-techie opened this issue 8 months ago • 3 comments

Description

We have observed force merge of stored fields opened with MMAPDirectory / MemorySegmentIndexInput regressing between 50% and 100% in Lucene 10.0 compared to Lucene 9.12.

After debugging quite a bit, I realized changing the DEFAULT_READADVICE back to SEQUENTIAL fixed the issue. [ Reference ]

I have made the draft changes - https://github.com/apache/lucene/compare/main...bharath-techie:lucene:merge-fix - similar to the ones made in https://github.com/apache/lucene/pull/13985 and verified that it fixes the regression. [ Includes Postings fix ]

But raising this issue to also figure out why this fix is not needed for 9.12 since the code has not changed in Lucene90CompressingStoredFieldsReader.

  1. Lucene 10 has FDT input initialized with random read advice reference

  2. Lucene 9.12 too has FDT input initialized with context with RandomAccess [reference]

  3. During merge flow , fieldsStream gets cloned here . I checked the code in 10.0 and the new object is still referring to the same memory segments of the previous reader which was initialized with random madvise.[singleSegmentImpl]. I also tried reverting the changes in clone flow to be same as 9.12 and still the regression was present.

[Note : compound is true and FDT is within the compound file , changing ReadAdvice in compound format didn't fix the regression but still adding this point for reference]

The only difference is the default context change between 9.12 and 10.0 which is changed from sequential to random. But the input was initialized with random.

Looking to hear from community on why the default context change affected Lucene 10.0 and if the draft works so that I can raise PR.

cc : @jpountz @uschindler

Version and environment details

Create a Big5 index using OSB workload using both OpenSearch 2.19 [Lucene 9.12.1] and OpenSearch 3.0/main [ Lucene 10.1 ] , keep a high flush threshold and disable refresh to keep similar segment sizes between the versions

opensearch-benchmark execute-test --target-hosts http://127.0.0.1:9200 --workload big5 --client-options timeout:120 --workload-params="number_of_shards:1,bulk_indexing_clients:1,number_of_replicas:0" --kill-running-processes --include-tasks delete-index,create-index,check-cluster-health,index-append,refresh-after-index,force-merge,refresh-after-force-merge,wait-until-merges-finish

Perform force merge to 1.


3.0

index shard prirep ip        segment generation docs.count docs.deleted    size size.memory committed searchable version compound
big5  0     p      127.0.0.1 _b              11   21474616            0   4.4gb           0 true      true       10.1.0  false
big5  0     p      127.0.0.1 _h              17   21556006            0   4.4gb           0 true      true       10.1.0  false
big5  0     p      127.0.0.1 _p              25   21447620            0   4.4gb           0 true      true       10.1.0  false
big5  0     p      127.0.0.1 _x              33   21544817            0   4.4gb           0 true      true       10.1.0  false
big5  0     p      127.0.0.1 _12             38    3582255            0 754.9mb           0 true      true       10.1.0  true
big5  0     p      127.0.0.1 _13             39    3579532            0 754.1mb           0 true      true       10.1.0  true
big5  0     p      127.0.0.1 _14             40    1071872            0 228.9mb           0 true      true       10.1.0  true
big5  0     p      127.0.0.1 _15             41   21743282            0   4.5gb           0 true      true       10.1.0  false

2.19

index shard prirep ip        segment generation docs.count docs.deleted    size size.memory committed searchable version compound
big5  0     p      127.0.0.1 _b              11   21474616            0   4.4gb           0 true      true       9.12.1  false
big5  0     p      127.0.0.1 _h              17   21427535            0   4.4gb           0 true      true       9.12.1  false
big5  0     p      127.0.0.1 _p              25   21549736            0   4.4gb           0 true      true       9.12.1  false
big5  0     p      127.0.0.1 _y              34   21568569            0   4.4gb           0 true      true       9.12.1  false
big5  0     p      127.0.0.1 _15             41    3581274            0 754.6mb           0 true      true       9.12.1  true
big5  0     p      127.0.0.1 _16             42    3578536            0 753.9mb           0 true      true       9.12.1  true
big5  0     p      127.0.0.1 _17             43   21889332            0   4.5gb           0 true      true       9.12.1  false
big5  0     p      127.0.0.1 _18             44     930402            0 199.1mb
Format / Version 9.12.1 10.1 Difference (10.1-9.12.1) % Change
         
Stored Fields 282,483 ms 402,009 ms +119,526 ms 42.31%
Norms 7,993 ms 6,186 ms -1,807 ms -22.61%
Postings 474,146 ms 571,138 ms +96,992 ms 20.46%
Doc Values 206,165 ms 203,756 ms -2,409 ms -1.17%
Points 87,173 ms 78,683 ms -8,490 ms -9.74%
Field Infos 0 ms 0 ms 0 ms 0%
Total Runtime 1,058.0s 1,261.8s +203.8s 19.26%

bharath-techie avatar Apr 10 '25 16:04 bharath-techie

Looks like compound is true and FDT being within the compound file matters here and it ended up being the root cause.

fieldsStream = d.openInput(fieldsStreamFN, context.withRandomAccess()); // d is compoundDirectory

In 3.0

In compound reader openInput , handle.slice(name, entry.offset, entry.length, context.readAdvice()) in Lucene90CompoundReader calls public final MemorySegmentIndexInput slice( String sliceDescription, long offset, long length, ReadAdvice advice) throws IOException - This madvises all memory segments. [ with random ]

In 2.19

Looks like handle.slice(name, entry.offset, entry.length) in Lucene90CompoundReader calls public final MemorySegmentIndexInput slice( String sliceDescription, long offset, long length) throws IOException - This does not call madvise for memory segments and simply initializes. Hence this was still fast in 2.19.

bharath-techie avatar Apr 14 '25 04:04 bharath-techie

Thanks @bharath-techie for the dive deep and root causing this.

@uschindler Thoughts on this if it is ok to proceed with updating readadvice for merges?

mgodwan avatar Apr 15 '25 10:04 mgodwan

go ahead!

uschindler avatar Jun 10 '25 14:06 uschindler