zstd-rs Improve compression support

[ ] Checksum support (<- Great first issue!)
[ ] Check if/how much building custom FSE tables improves the compression rate and what the performance implications are
[ ] Check if/how much rebuilding huffman tables for each block helps/hurts performance and ratio
[ ] Dictionary support
[ ] Check other matching algorithms and window sizes to implement more compression levels
[x] Make the Matcher trait public and allow users to provide their own implementations

Oct 07 '24 13:10 KillingSpark

This is awesome, I've been swamped with schoolwork so haven't really had time to finish up my compression addition.

Is there anywhere that my help would be valued? Once this is merged into master I can look into restructuring everything to be more organized, or:

I want to add a lot more to the CI, ala https://github.com/jonhoo/rust-ci-conf
I can add more documentation
I want to try experimenting with making the bitreader flip chunks using SIMD, I think it would lead to a real performance speedup

Nov 04 '24 20:11 zleyyij

One thing that would be nice to have is actually streaming the input. Currently I am being cheap and just read the source into a Vec and then process it as if it was passed as an parameter. That's obviously just a crutch and needs to change so large sources can be processed with efficient memory usage.

I'd hold off with doc on the encoder part for a bit, it's likely a lot will still change and keeping doc up to date can be tricky. And wrong/outdated doc is worse than no doc in my experience.

Trying out SIMD for the bitreader sounds interesting, especially for the case where three numbers are read in one call. The bitreader is one of the hottest codepaths in the decoding, even small gains can have a huge impact. It also has the benefit of not depending on this branch being merged soon.

The really complicated thin I'm hung up on currently is designing a good algorithm for finding the matches to generate sequences. I learned about efficiently searching strings in university but this is a bit different as the string to search in constantly changes as the data window moves. I think I found something half promising but it's annoying to implement in safe rust. I might put a pin into that an deffer this to later to get this branch into a mergeable state.

Anyways do what you think sounds most fun to you :)

Nov 04 '24 22:11 KillingSpark

@zleyyij I merged this and did some reorganizing myself. Feedback on that would be greatly appreciated.

I removed all re-exports for now so the common stuff is now hidden in ruzstd::encoding::xxx. These should probably come back to the top level imports? Anyways the doc page is now much cleaner after hiding all the internal modules :)

Dec 14 '24 16:12 KillingSpark

Awesome, I'll take a look tomorrow :)

On Sat, Dec 14, 2024 at 09:17 Moritz Borcherding @.***> wrote:

@zleyyij https://github.com/zleyyij I merged this and did some reorganizing myself. Feedback on that would be greatly appreciated.

I removed all re-exports for now so the common stuff is now hidden in ruzstd::encoding::xxx. These should probably come back to the top level imports? Anyways the doc page is now much cleaner after hiding all the internal modules :)

— Reply to this email directly, view it on GitHub https://github.com/KillingSpark/zstd-rs/issues/72#issuecomment-2543172166, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASCMLYRXE5ZEYWEO3N5UDYT2FRKYDAVCNFSM6AAAAABPP7CCZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNBTGE3TEMJWGY . You are receiving this because you were mentioned.Message ID: @.***>

Dec 15 '24 01:12 zleyyij

@KillingSpark

Here's my scratchpad:

Externally facing feedback

I cloned the repo on my machine then ran cargo doc --open. The homepage feels readable and organized. A few areas of improvement you could consider:
- If you use #![doc = include_str!("README.md")] in your module level documentation for lib.rs, then you can include the entire README in the docs page
- If you really wanted to improve docs for the compression side, I really like rustdoc's documentation format, and I would try to use that for every major public facing component https://doc.rust-lang.org/rustdoc/how-to-write-documentation.html#documenting-components

Internal implementation feedback

`compression/mod.rs`

compression/mod.rs is an unusual place to put the matcher trait if I'm understanding what it is correctly. Do you think it would be better suited under match_generator.rs?
The Matcher trait appears to define some domain specific language (a space), but it doesn't ever really define what a space is. Is it a heap alloaction where compressed data is written?

`match_generator.rs`

Module level documentation for match_generator.rs would be helpful, just a sentence or two explaining that it's generating matches for the literals section of a zstd frame or whatever it is
Your implementation for match_generator is elegant, I like it

`frame_compressor.rs`

I actually made this mistake when I was first writing the compressor stuff, I don't like that FrameCompressor::compress is basically a massive match chain, I think it'll get worse and worse as more compression support is added. What do you think about having a levels module, and for every compression level, there's a file with a single fn compress_LEVEL(input: R, output: &mut Vec<u8>). Then the match just has a CompressionLevel::uncompressed => compress_uncompressed(input, &mut output)

`bit_writer.rs`

bit_writer has a typo on line 10, outpu should be output.
Some of the stuff in bit_writer.rs is very difficult to understand, and could benefit from comments, eg:

       if idx % 8 != 0 {
            let bits_in_first_byte = 8 - (idx % 8);
            assert!(bits_in_first_byte <= num_bits);
            self.output.as_mut()[idx / 8] &= 0xFFu8 >> bits_in_first_byte;
            let new_bits = (bits << (8 - bits_in_first_byte)) as u8;
            self.output.as_mut()[idx / 8] |= new_bits;
            num_bits -= bits_in_first_byte;
            bits >>= bits_in_first_byte;
            idx += bits_in_first_byte;
        }

The function it's under has no docstring, and there are no comments throughout the body of the function, so it's difficult to grok easily.

Overall, great work, this is a very robust job you've done here. If you'd like help with some of these, I can help.

I'm currently working on SIMD optimization of reverse_bitreader, it's headache inducing but I am still making process (I love reading about discrete galois field matrix transforms /s). My current plan is to implement x86_64 (GFNI/AVX512VL) and ARM NEON support, then write a fallback platform agnostic version. Currently, it seems this will require nightly support enabled for this (maybe we lock behind a feature flag, inline some assembly, or use a stopgap crate until that's stabilized?)

Dec 15 '24 18:12 zleyyij

Thanks for the feedback! Lots of good stuff :)

I think the Matcher trait needs some work anyways so the MatcherDriver is generic over the Matcher and doesn't itself implement Matcher. This would allow user code to provide their own Matcher implementation if they want to. But with that in mind I think it is in the right place, even if it is only pub(crate) with a single implementation for now.

The SIMD work sounds very cool! I think swapping the bitreader based on a feature flag is fine, if the feature documents that the crate is only supported on nightly with that feature enabled

Dec 15 '24 19:12 KillingSpark

Just figured I'd update you, currently working on implementing dictionary support. It's far from done yet, but I'll try to be more regular about providing updates.

I found the paper facebook's implementation uses, and am in the process of working through it, hopefully resulting in dictionaries that compete with dictionaries created by the reference implementation.

Jun 10 '25 17:06 zleyyij

Is that paper publicly available? I'd be interested in reading that too :)

Jun 12 '25 09:06 KillingSpark

It is, "Effective Construction of Relative Lempel-Ziv Dictionaries" https://people.eng.unimelb.edu.au/ammoffat/abstracts/lpmw16www.pdf

As noted here: https://github.com/facebook/zstd/blob/dev/lib/dictBuilder/cover.c#L11-L18

It's a dense read, I'm working with a professor on it as a research project.

After that, I want to implement markov chain based dictionary generation, I believe it could generate higher quality dictionaries at the cost of speed.

Jun 12 '25 14:06 zleyyij