FastPFor icon indicating copy to clipboard operation
FastPFor copied to clipboard

API design: user wants low-level control for block-based compression

Open amallia opened this issue 5 years ago • 3 comments

Hi, one thing that I miss from this library is the ability to integrate into a bigger project where I handle the block encoding manually. To be more specific, it would be grate to have encodeArray and encodeBlock, so the user can decide what to use.

If i want to encode blocks of 128 elements, I don't want to use encodeArray because it will store the length of the block which is redundant.

amallia avatar Aug 11 '18 17:08 amallia

For an immediate solution, please consider these lower-level libraries...

  • https://github.com/lemire/simdcomp
  • https://github.com/lemire/streamvbyte/

The simdcomp library was designed specifically for such low-level work.

Otherwise, it is certainly possible to add what you seek to FastPFor, pull requests are invited.

lemire avatar Aug 12 '18 14:08 lemire

I managed to encode blocks of 128 elements using SImple8b, for example, by specifying MarkLength to false. Then it wont write the length for every block and the final result of using encodeArray for small blocks is probably the same of calling it on the entire list.

My main issue is with SIMDBP128. First, do you think it make sense to encode blocks of 128 elements using SIMDBP128? I have the impression the minimal length it makes sense is 16*128.

amallia avatar Aug 12 '18 16:08 amallia

If you want to design your own data layout, then the simdcomp library is probably a much better choice. It also comes with extra functions... see this paper: https://arxiv.org/abs/1611.05428

lemire avatar Aug 12 '18 22:08 lemire