tsz-rs icon indicating copy to clipboard operation
tsz-rs copied to clipboard

The length of the number of leading zeros probably should be 5 bits

Open lemolatoon opened this issue 1 year ago • 4 comments

I found in this crate, the number of leading zeros is encoded in 6 bits where the control bit is '11'. However, according to Gorilla's paper, it should be encoded in 5 bits.

(b) (Control bit ‘1’) Store the length of the number of leading zeros in the next 5 bits, then store the length of the meaningful XORed value in the next 6 bits. Finally store the meaningful bits of the XORed value.

The corresponding codes are here. https://github.com/jeromefroe/tsz-rs/blob/b3e2dce64707c42c10019d9159f88a0f594458af/src/encode/std_encoder.rs#L132-L138 https://github.com/jeromefroe/tsz-rs/blob/b3e2dce64707c42c10019d9159f88a0f594458af/src/decode/std_decoder.rs#L143

Is there a specific reason to use 6 bits for the leasing zeros in this crate?

lemolatoon avatar Jul 19 '24 19:07 lemolatoon

Hi @lemolatoon! It's been a while since I've worked on this crate so I don't recall for sure, but I think I may have just misread the paper and used 6 bits instead of 5 when implementing it 🤦

jeromefroe avatar Jul 20 '24 14:07 jeromefroe

@jeromefroe Thanks for your reply! I understand the circumstance. I could write some patches to fix this, but it would be a big breaking change. I will keep it. If you think this should be fixed, I'll send a Pull Request, so I would like you to let me know.

lemolatoon avatar Jul 24 '24 16:07 lemolatoon

If you want to fix it, I'd be more than happy to accept the pull request. It would just have to be released as a new version since as you noted it would be a breaking change.

jeromefroe avatar Jul 27 '24 20:07 jeromefroe

I was exercising around the Gorilla Compression. I found that If we have more than or equal to 32 bits of leading zeros, we cannot encode it in 5 bits. In that case, we should treat it as 31 bits of leading zeros, and the rest of zeroes as significant bits (in other words, meaningful bits). I found another implementation of Gorilla, and they do like that.

https://github.com/panagiotisl/chimp/blob/main/src/main/java/fi/iki/yak/ts/compression/gorilla/Compressor.java#L81-L84

I an thinking about writing patches and open pull requests for this issue when I am free.

Thank you!

lemolatoon avatar Aug 06 '24 21:08 lemolatoon