Expotential Unrolled List read_to_end in expull may consume a lot of memory. Since it is used by the postinglist record, it contains all docids(+optional positions, term frequencies) for one term, replace copy with iterator

Mar 20 '22 04:03 PSeitz

Codecov Report

Merging #1319 (f453a6f) into main (46d5de9) will increase coverage by 0.02%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1319      +/-   ##
==========================================
+ Coverage   94.25%   94.27%   +0.02%     
==========================================
  Files         232      232              
  Lines       40801    40790      -11     
==========================================
- Hits        38457    38456       -1     
+ Misses       2344     2334      -10

Impacted Files	Coverage Δ
common/src/lib.rs	`89.33% <ø> (ø)`
common/src/vint.rs	`92.34% <100.00%> (-0.01%)`	:arrow_down:
src/postings/recorder.rs	`98.26% <100.00%> (-0.30%)`	:arrow_down:
src/postings/stacker/expull.rs	`99.10% <100.00%> (+0.05%)`	:arrow_up:
src/store/index/mod.rs	`98.37% <0.00%> (+0.54%)`	:arrow_up:
src/indexer/segment_updater.rs	`95.93% <0.00%> (+1.04%)`	:arrow_up:
src/fastfield/serializer/mod.rs	`92.75% <0.00%> (+1.44%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 46d5de9...f453a6f. Read the comment docs.

Mar 20 '22 04:03 codecov-commenter

I prefer to work with a &[u8]. It makes it much easier to optimize things.

Did you observe a performance regression / improvement? Did it shave off the memory peaks you observed before during indexing?

Mar 21 '22 01:03 fulmicoton

Did you observe a performance regression / improvement? Did it shave off the memory peaks you observed before during indexing?

I didn't see an impact on indexing performance.

I noticed a big chunk (33.6MB) due to read_to_end, which was gone. This also makes sense, the posting lists can get huge for some terms.

I prefer to work with a &[u8]. It makes it much easier to optimize things.

For decompression a single vint an iterator was already used. I agree, on &[u8] is better to optimize, what I would prefer here is to complete one block on the correct vint bounds, so that we can create an vint iterator over the blocks.

Mar 22 '22 02:03 PSeitz

tantivy
tantivy copied to clipboard

Expull: replace read_to_end with iterator over bytes

Codecov Report

tantivy tantivy copied to clipboard

Expull: replace read_to_end with iterator over bytes

Codecov Report

tantivy
tantivy copied to clipboard