tantivy
tantivy copied to clipboard
Expull: replace read_to_end with iterator over bytes
Expotential Unrolled List read_to_end in expull may consume a lot of memory. Since it is used by the postinglist record, it contains all docids(+optional positions, term frequencies) for one term, replace copy with iterator
Codecov Report
Merging #1319 (f453a6f) into main (46d5de9) will increase coverage by
0.02%. The diff coverage is100.00%.
@@ Coverage Diff @@
## main #1319 +/- ##
==========================================
+ Coverage 94.25% 94.27% +0.02%
==========================================
Files 232 232
Lines 40801 40790 -11
==========================================
- Hits 38457 38456 -1
+ Misses 2344 2334 -10
| Impacted Files | Coverage Δ | |
|---|---|---|
| common/src/lib.rs | 89.33% <ø> (ø) |
|
| common/src/vint.rs | 92.34% <100.00%> (-0.01%) |
:arrow_down: |
| src/postings/recorder.rs | 98.26% <100.00%> (-0.30%) |
:arrow_down: |
| src/postings/stacker/expull.rs | 99.10% <100.00%> (+0.05%) |
:arrow_up: |
| src/store/index/mod.rs | 98.37% <0.00%> (+0.54%) |
:arrow_up: |
| src/indexer/segment_updater.rs | 95.93% <0.00%> (+1.04%) |
:arrow_up: |
| src/fastfield/serializer/mod.rs | 92.75% <0.00%> (+1.44%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 46d5de9...f453a6f. Read the comment docs.
I prefer to work with a &[u8]. It makes it much easier to optimize things.
Did you observe a performance regression / improvement? Did it shave off the memory peaks you observed before during indexing?
Did you observe a performance regression / improvement? Did it shave off the memory peaks you observed before during indexing?
I didn't see an impact on indexing performance.
I noticed a big chunk (33.6MB) due to read_to_end, which was gone. This also makes sense, the posting lists can get huge for some terms.
I prefer to work with a
&[u8]. It makes it much easier to optimize things.
For decompression a single vint an iterator was already used. I agree, on &[u8] is better to optimize, what I would prefer here is to complete one block on the correct vint bounds, so that we can create an vint iterator over the blocks.