Internally compress index bytes
This PR creates a new index serialization format that all new indexes are built with. A new configuration option lets users specify that they want the index data to be compressed (here using bzip2) when written to the output file.
This shrank the Federalist Papers index by a factor of 5x.
Reasons this shouldn't get merged yet:
- [ ] Still some todos in the code to be resolved
- [ ] I'm unsure of the implications for end-user gzipping. Would it be more reliable if the user just gzipped the file themselves?
- [ ] Most importantly, this deadlocks the web browser when indexes get loaded. The WASM binary has to uncompress the data before loading it into memory, and because the WASM executes on the browser's UI thread, the page is frozen during the several seconds it takes to uncompress the data. This is an unacceptable user experience and this PR will probably have to wait until Stork successfully supports running on a web worker. (Fortunately, because of the way the data is saved, the page only has to perform that decompression once upon page load, not once per search)
Codecov Report
Merging #280 (d42905f) into master (eeaca67) will decrease coverage by
10.67%. The diff coverage isn/a.
@@ Coverage Diff @@
## master #280 +/- ##
===========================================
- Coverage 72.44% 61.77% -10.68%
===========================================
Files 53 15 -38
Lines 2174 518 -1656
Branches 104 104
===========================================
- Hits 1575 320 -1255
+ Misses 598 197 -401
Partials 1 1
Continue to review full report at Codecov.
Legend - Click here to learn more
Ξ = absolute <relative> (impact),ΓΈ = not affected,? = missing dataPowered by Codecov. Last update c1463e3...d42905f. Read the comment docs.
Benchmarks
| Benchmark | Baseline | Contender | Comparison |
|---|---|---|---|
build/federalist | 229.5537 | 213.1282 | 0.93Γ π |
federalist.st | 1125.456 | 271.075 | 0.24Γ π |
search/federalist/liberty | 1.9478 | 1.9842 | 1.02Γ |
stork.js | 21.961 | 21.88 | 1.0Γ |
stork.wasm | 356.537 | 651.157 | 1.83Γ β οΈ |
Baseline: de70fb01688725b7955aa8a48b4fda7ef8be7993; Comparison: d42905f4b278949c40a002508c550e5c7719e2dd
Closing due to staleness