FastPFor C++ and Java implementations of fastpfor + varbyte unable to decode each others output

I've been investigating using the FastPFor algorithm for integer columns in Druid, following up on the discussion and experimentation mentioned in this issue. Initially, I was solely using your Java port of this library, and it's the basis of a PR I currently have open. However, curiosity got the better of me and I wanted to compare performance with the c++ implementation, sooo, I wrote a JNI wrapper to allow me to call this library from java, resulting in this branch which is a sort of alternate implementation of the PR. That's the short of how I encountered the issue mentioned in the title - seeing what happened when I fed one into the other. I have not had a chance to dig in to determine if the issue is in fact in this library or the other one, but it can be replicated by modifying the example program to write encoded data to a file, then loading that into a bytebuffer in java and attempting to decode.

#include <iostream>
#include <fstream>
#include <cstdlib>
#include <ctime>

#include "codecfactory.h"
using namespace std;
using namespace FastPForLib;

int main()
{
  IntegerCODEC &codec = *CODECFactory::getFromName("simdfastpfor256");
  size_t N = 10000;
  std::vector<uint32_t> mydata(N);

  srand (time(NULL));

  for (uint32_t i = 0; i < N; i++)
    mydata[i] = rand() % 100;

  std::vector<uint32_t> compressed_output(N + 1024);
  size_t compressedsize = compressed_output.size();
  codec.encodeArray(mydata.data(), mydata.size(), compressed_output.data(),
                    compressedsize);
  ofstream out("numbers.bin", ios::out | ios::binary);
  if(!out) {
    cout << "Cannot open file.";
    return 1;
   }

  out.write((char *)compressed_output.data(), N * sizeof(uint32_t));

  out.close();

  cout << "length " << compressedsize << "\n";
  return 0;
}

and then adjust the encodedSize variable to match the output of the c++ program in the following java

import com.google.common.io.Files;
import me.lemire.integercompression.FastPFOR;
import me.lemire.integercompression.IntWrapper;
import me.lemire.integercompression.SkippableComposition;
import me.lemire.integercompression.SkippableIntegerCODEC;
import me.lemire.integercompression.VariableByte;

...
    int buffSize = 1 << 16;

    int numValues = 10000;
    int maxNumValues = buffSize >> 2;

    SkippableIntegerCODEC codec = new SkippableComposition(new FastPFOR(), new VariableByte());
    int encodedSize = 2214;

    ByteBuffer encodedValues = Files.map(new File("numbers.bin"));

    int[] valueArray = new int[numValues];
    int[] encodedValueArray = new int[encodedSize];

    // copy encoded buffer to int array
    for (int i = 0; i < encodedSize; i++) {
      encodedValueArray[i] = encodedValues.getInt();
    }

    // decode with java

    IntWrapper outPos = new IntWrapper(0);
    codec.headlessUncompress(encodedValueArray, new IntWrapper(0), encodedSize, valueArray, outPos, numValues);
    // explodes before we get here
    assert (numValues == outPos.get());

The exception in this case was

java.lang.ArrayIndexOutOfBoundsException: 2555904
	at me.lemire.integercompression.FastPFOR.decodePage(FastPFOR.java:239)
	at me.lemire.integercompression.FastPFOR.headlessUncompress(FastPFOR.java:229)
	at me.lemire.integercompression.SkippableComposition.headlessUncompress(SkippableComposition.java:55)

but during experimentation I recall seeing exceptions coming from the VariableByte implementation as well, so the exact error may be dependent on length and composition of the input. I don't have one of these exceptions handy unfortunately, I will attempt to trigger it again and update the ticket. The JNI wrapper version can correctly read and decode the file data generated by the example program.

Of potential interest: the size of output varies slightly between the implementations, with c++ being a handful of int32 sized values larger. The full set of compatibility tests I threw together can be found here with the c++ snippet above, here. These tests that pass for me are java -> java, jni -> jni, external native encoded file -> jni, and the failing tests are jni -> java, java -> jni, external native encoded file -> java.

Another thing, which might be potentially related and indicative of a bug here (or a bug somewhere in my code), I experience an assert failure at the end of decodeArray function when using simd versions of fastpfor that only popped up whenever I would attempt to decode arbitrary offsets of a bytebuffer (from a memory mapped file). I did not experience this issue during initial testing of my JNI wrapper which was using newly allocated buffers populated with data. I explored briefly; commenting out the assert statement in the header and rebuilding the library resulted in the output decoded without exploding, but the final value would be incorrect when I went to validate the output. Since I experienced this issue only when decoding the mapped buffer where an encoded chunk could be located at arbitrary offsets, I speculated it was an issue with alignment, and indeed copying the values to a 16 byte aligned chunk has caused the assert failure to not appear again. That said, I haven't had the opportunity to try to replicate this behavior purely in c++ yet, so it is possible this issue is one of my own rather than this library, but wanted to throw it out there in the event it helps isolate why the java and c++ outputs are incompatible.

In general I've tested my branches with both my mac os laptop with

Apple LLVM version 10.0.0 (clang-1000.10.25.5)
g++-8 (Homebrew GCC 8.2.0) 8.2.0

and ubuntu linux with

g++ (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609

but will double check that both issues happen on linux as soon as possible, and will attempt to dig into this deeper in general whenever I get the chance.

I'm quite excited to get this stuff worked into Druid, and using this version of the library is currently my preference since it's benchmarking faster and makes the implementation of the java part of my PR a bit more straightforward, but I think it's probably important to have the ability to fallback to a java only implementation to make the decision a little less permanent.

Aug 30 '18 08:08 clintropolis

It is certainly the case that the C++ code and the Java code is not interoperable. If you find a claim that it is the case somewhere, then this claim is false and will need to be removed.

If these distinct libraries were interoperable, we would have actual tests.

It is certainly possible to make it interoperable. For example, the Parquet folks started from some code in the Java library you mentioned, and then they transformed so it could be interoperable with fast and efficient C++ code on little-endian processors.

However, it is not trivial to make Java and C++ interoperable in this setting. You have to account for various things such as the fact that Java is big-endian whereas C++ endianness is typically determined by the underlying processor (and not defined by the language), and the fact that Java does not do pointer casting, SIMD instructions and pointer arithmetic, so some optimizations that are trivial in C++ are just not possible in Java.

So if cross-language interoperability is a feature you need, we do not currently provide it and nobody is working on it. There has never been any such request before. Pull requests are invited, however.

Now, there are interoperable (Java/C++) code out there. For example, Masked VByte is a C (C++) decoder for the varint format used by Lucene (Java). This is used in production at Indeed.

http://maskedvbyte.org

https://github.com/lemire/MaskedVByte

And, if this need saying, the Roaring format that Druid already uses is interoperable (C,C++,Python, Java and so forth) across languages.

Aug 31 '18 16:08 lemire

If you have picked a particular format you care about, and you interested in helping write a interoperable Java codec, I am certainly available to help with that. I can help make sure that if you are a capable Java/C++ coder, it won't take more than a couple of days of work.

It is not going to be just turning on a switch, however. And the lack of interoperability right now is not a bug. It is a missing feature.

Aug 31 '18 16:08 lemire

Oops, I meant to put in my writeup that I had no explicit reason to expect that this should work, but I got distracted collecting the details and forgot to put that in along the way. My decision to feed the outputs of one into the other was more of an ad hoc 'hmm, I wonder what happens when I do this', coupled with a perhaps naive hope that maybe the composition of fastpfor256 and variablebyte was more of a specification than it is, and did the writeup in the event it was supposed to be the case. In other words, don't worry, the docs didn't mislead me into doing this 😅

I don't know if having a fallback to java is strictly necessary for us, I'll need to discuss this with other Druid commiters. Most of my experimentation has been with fastpfor+variablebyte, so a java implementation that could decode simdfastpfor256 would probably be what I am after if a java fallback is necessary. I will follow up once I get some feedback one way or another.

Thanks for the quick response! (and for all the research papers and code, it's quite fascinating!)

Sep 03 '18 22:09 clintropolis

Ok. As I wrote, it is quite doable to have cross-language interoperability. Just a few days of work.

Sep 04 '18 23:09 lemire

hi @lemire , do you have any pointer to the Parquet team's work on making Java implementation interoperable with c++? I am interested in looking into this problem.

Feb 15 '19 22:02 xndai

@xndai

Can you elaborate? Parquet’s C++ and Java code is on GitHub.

Feb 15 '19 23:02 lemire

Ok, after some research, it seems to me that the endian is handled by IntBasedBitPackingGenerator.java (https://github.com/apache/parquet-mr/blob/dc61e510126aaa1a95a46fe39bf1529f394147e9/parquet-generator/src/main/java/org/apache/parquet/encoding/bitpacking/IntBasedBitPackingGenerator.java). The class generates codes for both little endian and big endian packing/unpacking methods.

Besides the endian issue, is there anything affects the interoperability? I am asking this is because we are adapting FastPFor C++. But potentially we will have a Java reader in the future.

Feb 15 '19 23:02 xndai

It is difficult in the abstract to list everything one needs to take into account but you have: Java does not have native unsigned integers, packed data structures having various field widths are difficult to do efficiently in Java, Java does not have SIMD access in the same way you do in C++...

Feb 16 '19 00:02 lemire

FastPFor FastPFor copied to clipboard

C++ and Java implementations of fastpfor + varbyte unable to decode each others output

FastPFor
FastPFor copied to clipboard