JavaFastPFOR icon indicating copy to clipboard operation
JavaFastPFOR copied to clipboard

Is it possible for JavaFastPFOR to work with LONG?

Open yan-qi opened this issue 10 years ago • 9 comments

yan-qi avatar Jul 02 '15 21:07 yan-qi

Hello, I'm facing exactly the same issue - but I'm about to implement a workaround - write longs to ByteBuffer, then get AsIntBuffer and .array() for it. Then it can be divided into two parts - one for first 32 bits and one for last 32 bits. So there will be smaller variance between two sets of integers - I'm about to test this approach, and will write about achieved compression ratio. But for sure it would be faster if JavaFastPFOR already worked with int64. (some algorithms would require significant modification)

kk00ss avatar Dec 09 '15 12:12 kk00ss

@kk00ss

ByteBuffers are often atrociously slow.

lemire avatar Dec 09 '15 12:12 lemire

Could you please suggest an other workaround? I can think of splitting 64bits integers into groups with differences no larger than int32.MaxValue probably. It should be much simpler actually.

kk00ss avatar Dec 09 '15 12:12 kk00ss

Contributions to the library toward the resolution of this issue are invited.

lemire avatar Dec 09 '15 14:12 lemire

+1 need support for long as well

tenstriker avatar Mar 11 '16 05:03 tenstriker

:+1: would be super interesting

JohannesLichtenberger avatar Feb 26 '20 17:02 JohannesLichtenberger

Pull requests invited.

lemire avatar Feb 26 '20 18:02 lemire

Any progress here?

PhilippS93 avatar Jan 06 '22 12:01 PhilippS93

@PhilippS93 Are you interested in helping?

lemire avatar Jan 08 '22 01:01 lemire

@lemire I would be keen to give a hand, but do you have some designs in mind ? Here a few suggestion:

  • We would introduce a LongCODEC (similarly to IntegerCODEC) ?
  • A first implementation would consist in relying on an underlying IntegerCODEC ? (similarly to Composition and SkippableComposition)
  • This LongAs2IntsCodec would split one long as 2 integers ?
  • We would prefer buffering 128 long into 2x128 ints, in order to comply with FastPFOR128, and hoping that 128xhighParts would compress reasonably (like 128xlowParts) (as we expect interleaving highparts and lowParts to give poor results) ?
  • Is there any alternative implementations ? (Java, or not-Java FastPFOR implementations)
  • Do we have some available dataset ?

blacelle avatar Nov 25 '22 12:11 blacelle

Yes. A LongCODEC seems like a reasonable first step. I would start with just Variable Byte. I would not start with something as complicated as FastPFOR128.

This LongAs2IntsCodec would split one long as 2 integers

This does not sound ideal.

Is there any alternative implementations ?

In C, surely... in Java, you will only find specific cases. E.g., you will definitively find existing code for a variable byte implementation.

Do we have some available dataset ?

It is a good question. I have do not have any.

lemire avatar Nov 25 '22 15:11 lemire

Got it. Let me propose a first draft, we'll see what we can achieve.

blacelle avatar Nov 25 '22 17:11 blacelle

Fixed by @blacelle, see https://github.com/lemire/JavaFastPFOR/pull/55

Of course, we can create many more codecs... the possibilities are endless, but we now have the base.

Closing.

lemire avatar Nov 28 '22 15:11 lemire