stork Demo: compress index to save network bandwidth

The demo site https://stork-search.net/ downloads a 1.7M index of the Federalist Papers uncompressed over the network.

To better evaluate the impact of adding stork search to a static site, it would help to precompress said example index.

Brotli compression with default settings compresses the same index down to 322K.

Likewise for the 180K .wasm file.

Mar 07 '20 16:03 cloudspeech

That's a solid plan. I briefly looked into different compression options, because I quickly realized that for whatever reason these index files are highly compressible. I'll close this issue when the CDN starts doing that compression!

Mar 07 '20 20:03 jameslittle230

Hey @jameslittle230, out of curiosity: have you considered using fst as the data structure for your index ?

Apr 06 '20 17:04 ngirard

@ngirard I have not (hadn’t heard of it), and from a quick glance, I can’t tell what the benefit would be. Can you tell me more about how it might make Stork better?

Apr 06 '20 18:04 jameslittle230

@jameslittle230, TBH I haven't given much thought about it. It just popped out of my mind as I was skimming through your project's pages and read that your index files were being lengthy. Since fst can produce compact indices, I thought it could help, but I might very well be wrong!

Apr 06 '20 18:04 ngirard

@ngirard - that makes sense. I was taking another look at it last night and was trying to think about where it slots in — I have some ideas that I want to try out.

Thanks for letting me know about the library — much appreciated. :)

Apr 07 '20 16:04 jameslittle230

@jameslittle230, in any case I'm glad I introduced you to this nice crate.

And thank you for investing your time into this nice project of yours.

Cheers & take care !

Apr 07 '20 16:04 ngirard

Coming back to the compression aspect of this: it looks like I'll have to:

Set up infrastructure to compress the files automatically on every deploy
Use that infrastructure to upload uncompressed, gzipped, and brotli'd files to S3
Write (or lift) up a Lambda@edge function to switch on the incoming Accept-Encoding header which will rewrite the requested URI to send the correct file from the S3 bucket
Test that different requests with different Accept-Encoding headers are receiving different bits over the wire

Apr 21 '20 03:04 jameslittle230

CloudFront has native Brotli support. Is that not supported for your file types for any reason?

Feb 21 '21 14:02 monken

@monken - the WASM file and the index are not served with the MIME types in the File types that CloudFront Compresses list.

Feb 21 '21 19:02 jameslittle230

The index size is concerning to me as I consider using this. The size of the first 20 Federalist Papers is only 241KB. If the index is 1.13MB then the index is over 4 times the size of the indexed data.

I have a static site for a book where the text data is 1.1MB broken up into 22 files. If integrating search would add about 5MB to the page load size, it seems prohibitive.

On a related note, why are you only searching the first 20 papers? Is it because the Stork can't handle the whole thing for some reason?

Apr 02 '21 17:04 jtbayly

@jtbayly - It's a fair point! Over time, I hope to be able to make improvements to the index file format to reduce their size.

The 20-file limit in the demo, though, is unrelated. To build the demo, I manually pulled each paper from the source and cleaned up the text by hand. I stopped doing that once I reached 20 because it ended up taking more of my time than it was worth.

Apr 03 '21 01:04 jameslittle230

It might be a better idea to add fflate support (small and fast gzip/zip/deflate compressor and decompressor) so no extra server configuration is needed, since stork seems to be mostly for static sites.

Jan 12 '22 01:01 easrng

stork stork copied to clipboard

Demo: compress index to save network bandwidth

stork
stork copied to clipboard