stork icon indicating copy to clipboard operation
stork copied to clipboard

Internally compress index bytes

Open jameslittle230 opened this issue 4 years ago β€’ 2 comments

This PR creates a new index serialization format that all new indexes are built with. A new configuration option lets users specify that they want the index data to be compressed (here using bzip2) when written to the output file.

This shrank the Federalist Papers index by a factor of 5x.

Reasons this shouldn't get merged yet:

  • [ ] Still some todos in the code to be resolved
  • [ ] I'm unsure of the implications for end-user gzipping. Would it be more reliable if the user just gzipped the file themselves?
  • [ ] Most importantly, this deadlocks the web browser when indexes get loaded. The WASM binary has to uncompress the data before loading it into memory, and because the WASM executes on the browser's UI thread, the page is frozen during the several seconds it takes to uncompress the data. This is an unacceptable user experience and this PR will probably have to wait until Stork successfully supports running on a web worker. (Fortunately, because of the way the data is saved, the page only has to perform that decompression once upon page load, not once per search)

jameslittle230 avatar Mar 29 '22 23:03 jameslittle230

Codecov Report

Merging #280 (d42905f) into master (eeaca67) will decrease coverage by 10.67%. The diff coverage is n/a.

@@             Coverage Diff             @@
##           master     #280       +/-   ##
===========================================
- Coverage   72.44%   61.77%   -10.68%     
===========================================
  Files          53       15       -38     
  Lines        2174      518     -1656     
  Branches      104      104               
===========================================
- Hits         1575      320     -1255     
+ Misses        598      197      -401     
  Partials        1        1               
Impacted Files Coverage Ξ”
stork-cli/src/clap.rs
stork-cli/src/main.rs
stork-lib/src/config/mod.rs
stork-lib/src/index_v2/mod.rs
stork-lib/src/index_v3/build/errors.rs
stork-lib/src/index_v3/build/fill_containers.rs
...rc/index_v3/build/fill_intermediate_entries/mod.rs
stork-lib/src/index_v3/build/fill_stems.rs
stork-lib/src/index_v3/build/mod.rs
stork-lib/src/index_v3/mod.rs
... and 28 more

Continue to review full report at Codecov.

Legend - Click here to learn more Ξ” = absolute <relative> (impact), ΓΈ = not affected, ? = missing data Powered by Codecov. Last update c1463e3...d42905f. Read the comment docs.

codecov[bot] avatar Mar 29 '22 23:03 codecov[bot]

Benchmarks

BenchmarkBaselineContenderComparison
build/federalist229.5537213.12820.93Γ— πŸŽ‰
federalist.st1125.456271.0750.24Γ— πŸŽ‰
search/federalist/liberty1.94781.98421.02Γ—
stork.js21.96121.881.0Γ—
stork.wasm356.537651.1571.83Γ— ⚠️

Baseline: de70fb01688725b7955aa8a48b4fda7ef8be7993; Comparison: d42905f4b278949c40a002508c550e5c7719e2dd

github-actions[bot] avatar Mar 29 '22 23:03 github-actions[bot]

Closing due to staleness

jameslittle230 avatar Sep 30 '22 15:09 jameslittle230