tantivy icon indicating copy to clipboard operation
tantivy copied to clipboard

Add piecewise linear codec, deprecate linear and mulitinear, complete benchmark with real world datasets.

Open fmassot opened this issue 3 years ago • 4 comments

Ok PR is ready for review.

Here is a summary of the PR:

  • Piecewise linear codec added: basically it's a refactor and it fixes #1215
  • #1212 partially fixed that by saving 8 bytes.
  • Linear interpolation and multilinear deprecated. It's now only there for the reading part so that users can still read these fast field codecs
  • FOR codec added with feature unstable, it's not used in tantivy for the moment.
  • Bench also compression time/reading time and on real-world datasets.

fmassot avatar Nov 25 '21 11:11 fmassot

Codecov Report

Merging #1217 (53785d5) into main (c503c6e) will decrease coverage by 0.20%. The diff coverage is 82.59%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1217      +/-   ##
==========================================
- Coverage   94.04%   93.84%   -0.21%     
==========================================
  Files         205      207       +2     
  Lines       34525    35023     +498     
==========================================
+ Hits        32470    32868     +398     
- Misses       2055     2155     +100     
Impacted Files Coverage Δ
fastfield_codecs/src/main.rs 0.54% <0.00%> (-0.40%) :arrow_down:
fastfield_codecs/src/linearinterpol.rs 95.15% <92.85%> (-4.39%) :arrow_down:
fastfield_codecs/src/frame_of_reference.rs 94.70% <94.70%> (ø)
src/fastfield/serializer/mod.rs 92.94% <95.00%> (+2.94%) :arrow_up:
fastfield_codecs/src/piecewise_linear.rs 97.64% <97.64%> (ø)
fastfield_codecs/src/bitpacked.rs 100.00% <100.00%> (ø)
fastfield_codecs/src/lib.rs 96.92% <100.00%> (+0.04%) :arrow_up:
fastfield_codecs/src/multilinearinterpol.rs 97.11% <100.00%> (-0.93%) :arrow_down:
src/fastfield/mod.rs 92.30% <100.00%> (-0.02%) :arrow_down:
src/fastfield/reader.rs 81.74% <100.00%> (-13.17%) :arrow_down:
... and 35 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c503c6e...53785d5. Read the comment docs.

codecov-commenter avatar Nov 25 '21 11:11 codecov-commenter

@PSeitz @fulmicoton : here is my take on what to do for next tantivy release and what to do after:

These remarks are valid for integers only (for floats, our codecs are useless):

  • piecewise linear interpolation should have the best compression ratio for monotonically increasing series
  • bitpacking should have the best compression ratio on random stuff
  • frame of reference (FOR) should have the best compression for series in between

What can we do for tantivy next release?

  • unplug linear interpolation and replace multilinear interpolation by new piecewise linear interpolation (the bug fix increases a little bit the compression ratio but nothing noteworthy)
  • update the benchmark with new datasets
  • I don't want to plug immediately the FOR stuff because I think we can do slightly better and I would also want to activate piecewise interpolation only on monotonically increasing series, I would introduce that later.

Future work:

  • finalize the FOR codecs, activate piecewise linear for only monotonically increasing series
  • sparse codecs
  • float codecs

fmassot avatar Dec 06 '21 08:12 fmassot

@fmassot agreeing with the above.

fulmicoton avatar Dec 07 '21 09:12 fulmicoton

Codecov Report

Merging #1217 (14d5385) into main (447811c) will decrease coverage by 0.26%. The diff coverage is 82.67%.

@@            Coverage Diff             @@
##             main    #1217      +/-   ##
==========================================
- Coverage   94.23%   93.97%   -0.27%     
==========================================
  Files         232      234       +2     
  Lines       40850    41414     +564     
==========================================
+ Hits        38496    38919     +423     
- Misses       2354     2495     +141     
Impacted Files Coverage Δ
fastfield_codecs/src/main.rs 0.54% <0.00%> (-0.40%) :arrow_down:
fastfield_codecs/src/linearinterpol.rs 95.15% <92.85%> (-4.39%) :arrow_down:
fastfield_codecs/src/frame_of_reference.rs 94.73% <94.73%> (ø)
src/fastfield/serializer/mod.rs 92.85% <95.23%> (+1.55%) :arrow_up:
fastfield_codecs/src/piecewise_linear.rs 97.63% <97.63%> (ø)
fastfield_codecs/src/bitpacked.rs 100.00% <100.00%> (ø)
fastfield_codecs/src/lib.rs 96.92% <100.00%> (+0.04%) :arrow_up:
fastfield_codecs/src/multilinearinterpol.rs 97.08% <100.00%> (-1.93%) :arrow_down:
src/fastfield/mod.rs 92.93% <100.00%> (-0.02%) :arrow_down:
src/fastfield/reader.rs 76.37% <100.00%> (-12.70%) :arrow_down:
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 447811c...14d5385. Read the comment docs.

codecov-commenter avatar Mar 26 '22 20:03 codecov-commenter