Is it possible for JavaFastPFOR to work with LONG?
Hello, I'm facing exactly the same issue - but I'm about to implement a workaround - write longs to ByteBuffer, then get AsIntBuffer and .array() for it. Then it can be divided into two parts - one for first 32 bits and one for last 32 bits. So there will be smaller variance between two sets of integers - I'm about to test this approach, and will write about achieved compression ratio. But for sure it would be faster if JavaFastPFOR already worked with int64. (some algorithms would require significant modification)
@kk00ss
ByteBuffers are often atrociously slow.
Could you please suggest an other workaround? I can think of splitting 64bits integers into groups with differences no larger than int32.MaxValue probably. It should be much simpler actually.
Contributions to the library toward the resolution of this issue are invited.
+1 need support for long as well
:+1: would be super interesting
Pull requests invited.
Any progress here?
@PhilippS93 Are you interested in helping?
@lemire I would be keen to give a hand, but do you have some designs in mind ? Here a few suggestion:
- We would introduce a
LongCODEC(similarly toIntegerCODEC) ? - A first implementation would consist in relying on an underlying
IntegerCODEC? (similarly toCompositionandSkippableComposition) - This LongAs2IntsCodec would split one long as 2 integers ?
- We would prefer buffering 128 long into 2x128 ints, in order to comply with
FastPFOR128, and hoping that 128xhighParts would compress reasonably (like 128xlowParts) (as we expect interleaving highparts and lowParts to give poor results) ? - Is there any alternative implementations ? (Java, or not-Java FastPFOR implementations)
- Do we have some available dataset ?
Yes. A LongCODEC seems like a reasonable first step. I would start with just Variable Byte. I would not start with something as complicated as FastPFOR128.
This LongAs2IntsCodec would split one long as 2 integers
This does not sound ideal.
Is there any alternative implementations ?
In C, surely... in Java, you will only find specific cases. E.g., you will definitively find existing code for a variable byte implementation.
Do we have some available dataset ?
It is a good question. I have do not have any.
Got it. Let me propose a first draft, we'll see what we can achieve.
Fixed by @blacelle, see https://github.com/lemire/JavaFastPFOR/pull/55
Of course, we can create many more codecs... the possibilities are endless, but we now have the base.
Closing.