Optimal pointcloud compression
Is your feature request related to a problem? Please describe.
I am looking for the best general-purpose solution (prioritizing a balance of compression ratio and decompression speed) for compressing pointcloud data, specifically long arrays of int32 values where each chunk of 3 corresponds to the x, y, z coordinates for a given point. The industry standard is laz but that suffers from slow decompression speed (more than 10x worse than zstd) due to the use of slow arithmetic coders. There is also draco(which implements https://arxiv.org/abs/cs/9909018) which offers competitive compression ratios but is generally slower than zstd for both compression and decompression.
I have found pretty good results by:
- Sorting the points in morton order
- Shuffling/transposing the bytes from a
N*12array to a12*Narray (so that all the first bytes of each point are clustered together and so on) using the blosc2 "shuffle" filter - Storing the byte delta using the blosc2 bytedelta filter
- Compressing with zstd -16
This quite often gets a better compression ratio than laz, with an order of magnitude faster decompression.
For some datasets this outperforms (in terms of compression ratio) draco by more than 10%, but for other datasets draco outperforms the above scheme by more than 10%.
I could compress a subset of points with both my scheme and draco and use the best algorithm for the whole set of points, but I feel like there might be a better solution out there.
Describe the solution you'd like Ideally I would like to configure and/or modify zstd to take more advantage of the inherent structure in the data and reliably get the best compression ratio on most/all datasets.
I would like any advice on where to look or what experiments I could try.