roaring
roaring copied to clipboard
What to do about big endian architectures' support?
During evaluation of #198 the issue of big endian code not being tested by the CI pipelines was raised.
This means, despite code being in place to support it, the architectures should be considered unsupported because they're untested.
There are a few options to deal with this:
- Properly considering those archs unsupported, in which case the reasonable way to deal with it is to remove the code and just assume little endian;
- Add pipelines to run on at least one such architecture.
The Go toolchain supports a few BE archs, in particular PowerPC64 with GOARCH=ppc64
, and qemu can be used to run commands on the emulated architecture, and I successfully cross-compiled and run the tests for ppc64 that way locally:
$ GOARCH=ppc64 go test -c -o ppc64_test
$ qemu-ppc64 ./ppc64_test
==roaring==
{1,2,3,4,5,100,1000}
{3,4,1000}
{}
Cardinality: 7
Contains 3? true
1
3
4
5
1000
Wrote 22 bytes
I wrote the content to a byte stream and read it back.
size before run optimize: 1810 bytes, and after: 38 bytes.
2022/01/13 19:39:53 rtest N= 15
2022/01/13 19:39:53 rtest N= 1024
2022/01/13 19:39:54 rtest N= 4096
2022/01/13 19:39:54 rtest N= 65536
2022/01/13 19:39:55 rtest N= 1048576
2022/01/13 19:40:04 rtest N= 15
2022/01/13 19:40:04 rtest N= 100
2022/01/13 19:40:04 rtest N= 512
2022/01/13 19:40:04 rtest N= 1023
2022/01/13 19:40:04 rtest N= 1025
2022/01/13 19:40:04 rtest N= 4095
2022/01/13 19:40:04 rtest N= 4096
2022/01/13 19:40:04 rtest N= 4097
2022/01/13 19:40:04 rtest N= 65536
2022/01/13 19:40:05 rtest N= 1048576
2022/01/13 19:41:26 rtest N= 15
2022/01/13 19:41:26 rtest N= 1024
2022/01/13 19:41:26 rtest N= 4096
2022/01/13 19:41:26 rtest N= 65536
2022/01/13 19:41:26 rtest N= 1048576
2022/01/13 19:41:35 rtest N= 15
2022/01/13 19:41:35 rtest N= 100
2022/01/13 19:41:35 rtest N= 512
2022/01/13 19:41:35 rtest N= 1023
2022/01/13 19:41:35 rtest N= 1025
2022/01/13 19:41:35 rtest N= 4095
2022/01/13 19:41:35 rtest N= 4096
2022/01/13 19:41:35 rtest N= 4097
2022/01/13 19:41:35 rtest N= 65536
2022/01/13 19:41:36 rtest N= 1048576
200100
PASS
In case we choose to support those by adding pipelines to check them, we should decide whether we want to make them first or second class citizens with regards to performance. Most mainstream architectures being little endian, it may make sense to trade off performance in BE if this improves performance for the LE case.
This question is practical rather than hypothetical, as it was suggested for hashing to cast slices in a way that would mean extra allocations for BE, in exchange for the ability to pass bigger blocks to the hasher.
For a start, it would be interesting to know whether there are currently any users of BE roaring.
Most mainstream architectures being little endian, it may make sense to trade off performance in BE if this improves performance for the LE case.
I think that this is absolutely true. BE is a fringe requirement.
I bet that we have no BE user. It might be fine to have a fallback (slow) approach.
I think it's worth documenting in the README that BE is only supported as a fallback and will suffer degraded performance for some operations. Just to avoid surprises and spurious issues in the unlikely case someone chooses to user roaring there.
I'm working on a PR to try and test on ppc64. I could add a line in that same PR about it.
I believe https://github.com/RoaringBitmap/roaring/blob/master/serialization_generic.go does not assume endianness, hence the "generic" monicker. I also believe it to be correct, as that was what was being used on arm64 machines until we updated the build flags. Now, those arm64 machines are little endian, so there might be something else that breaks on BE, but the intent to support both is there, at least. The performance wasn't terrible using serialization_generic, but it did do many more allocations, as it couldn't do unsafe slice manipulations.
@jacksonrnewhouse that's not the code that runs for little endian. Despite saying it's generic, because it technically is, for little endian this is what gets built: https://github.com/RoaringBitmap/roaring/blob/master/serialization_littleendian.go
This performs better because it simply does unsafe casting of the slices, but is only valid on LE. It doesn't allocate or copy or call generic functions (those get inlined in practice tho, but all the rest is still overhead).
Regarding whether it is correct: we still need to test it, and we still prefer to update the build flags so LE use the LE-optimized path. We'll inevitably lag behind for some time when new archs come out, but that still is gonna happen and thus we'll still be skipping the generic code from tests.