vortex
vortex copied to clipboard
improving the utility function for sampling compression
Currently, per the BtrBlocks paper, our sampling compression logic selects the combination of encodings that leads to the smallest size in bytes. Roughly, this implies that we consider all of our codecs as "equivalently good along all dimensions besides size", which is obviously not true.
Some examples:
- we may not want to add another entire level of compression in order to save 1 byte from a large array, because the size saving isn't worth the tradeoff in compression/decompression speed
- we may want to prefer (patched) codecs without data dependencies (e.g., dict/zigzag/ffor/constant/frequency) over variants of delta encoding, in order to support efficient random access when possible