tantivy
tantivy copied to clipboard
Add piecewise linear codec, deprecate linear and mulitinear, complete benchmark with real world datasets.
Ok PR is ready for review.
Here is a summary of the PR:
- Piecewise linear codec added: basically it's a refactor and it fixes #1215
- #1212 partially fixed that by saving 8 bytes.
- Linear interpolation and multilinear deprecated. It's now only there for the reading part so that users can still read these fast field codecs
- FOR codec added with feature
unstable
, it's not used in tantivy for the moment. - Bench also compression time/reading time and on real-world datasets.
Codecov Report
Merging #1217 (53785d5) into main (c503c6e) will decrease coverage by
0.20%
. The diff coverage is82.59%
.
@@ Coverage Diff @@
## main #1217 +/- ##
==========================================
- Coverage 94.04% 93.84% -0.21%
==========================================
Files 205 207 +2
Lines 34525 35023 +498
==========================================
+ Hits 32470 32868 +398
- Misses 2055 2155 +100
Impacted Files | Coverage Δ | |
---|---|---|
fastfield_codecs/src/main.rs | 0.54% <0.00%> (-0.40%) |
:arrow_down: |
fastfield_codecs/src/linearinterpol.rs | 95.15% <92.85%> (-4.39%) |
:arrow_down: |
fastfield_codecs/src/frame_of_reference.rs | 94.70% <94.70%> (ø) |
|
src/fastfield/serializer/mod.rs | 92.94% <95.00%> (+2.94%) |
:arrow_up: |
fastfield_codecs/src/piecewise_linear.rs | 97.64% <97.64%> (ø) |
|
fastfield_codecs/src/bitpacked.rs | 100.00% <100.00%> (ø) |
|
fastfield_codecs/src/lib.rs | 96.92% <100.00%> (+0.04%) |
:arrow_up: |
fastfield_codecs/src/multilinearinterpol.rs | 97.11% <100.00%> (-0.93%) |
:arrow_down: |
src/fastfield/mod.rs | 92.30% <100.00%> (-0.02%) |
:arrow_down: |
src/fastfield/reader.rs | 81.74% <100.00%> (-13.17%) |
:arrow_down: |
... and 35 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update c503c6e...53785d5. Read the comment docs.
@PSeitz @fulmicoton : here is my take on what to do for next tantivy release and what to do after:
These remarks are valid for integers only (for floats, our codecs are useless):
- piecewise linear interpolation should have the best compression ratio for monotonically increasing series
- bitpacking should have the best compression ratio on random stuff
- frame of reference (FOR) should have the best compression for series in between
What can we do for tantivy next release?
- unplug linear interpolation and replace multilinear interpolation by new piecewise linear interpolation (the bug fix increases a little bit the compression ratio but nothing noteworthy)
- update the benchmark with new datasets
- I don't want to plug immediately the FOR stuff because I think we can do slightly better and I would also want to activate piecewise interpolation only on monotonically increasing series, I would introduce that later.
Future work:
- finalize the FOR codecs, activate piecewise linear for only monotonically increasing series
- sparse codecs
- float codecs
@fmassot agreeing with the above.
Codecov Report
Merging #1217 (14d5385) into main (447811c) will decrease coverage by
0.26%
. The diff coverage is82.67%
.
@@ Coverage Diff @@
## main #1217 +/- ##
==========================================
- Coverage 94.23% 93.97% -0.27%
==========================================
Files 232 234 +2
Lines 40850 41414 +564
==========================================
+ Hits 38496 38919 +423
- Misses 2354 2495 +141
Impacted Files | Coverage Δ | |
---|---|---|
fastfield_codecs/src/main.rs | 0.54% <0.00%> (-0.40%) |
:arrow_down: |
fastfield_codecs/src/linearinterpol.rs | 95.15% <92.85%> (-4.39%) |
:arrow_down: |
fastfield_codecs/src/frame_of_reference.rs | 94.73% <94.73%> (ø) |
|
src/fastfield/serializer/mod.rs | 92.85% <95.23%> (+1.55%) |
:arrow_up: |
fastfield_codecs/src/piecewise_linear.rs | 97.63% <97.63%> (ø) |
|
fastfield_codecs/src/bitpacked.rs | 100.00% <100.00%> (ø) |
|
fastfield_codecs/src/lib.rs | 96.92% <100.00%> (+0.04%) |
:arrow_up: |
fastfield_codecs/src/multilinearinterpol.rs | 97.08% <100.00%> (-1.93%) |
:arrow_down: |
src/fastfield/mod.rs | 92.93% <100.00%> (-0.02%) |
:arrow_down: |
src/fastfield/reader.rs | 76.37% <100.00%> (-12.70%) |
:arrow_down: |
... and 11 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 447811c...14d5385. Read the comment docs.