TileDB
TileDB copied to clipboard
Using laz-perf as a LAZ compressor?
I've been looking into storing pointcloud data in TileDB arrays, but one of my hesitations is the larger data volumes relative to LAZ files. I played around with different compressors/levels for coords/attributes, but couldn't get anything close to the original LAZ filesize.
Would it be possible to link in laz-perf and provide laz
as an additional compressor?
Hi @ryan-salo, we will soon publish several tutorials on tweaking the TileDB compression for LAZ data. The current defaults are not appropriate, we are fixing those in the next imminent release.
To achieve even better compression, we are designing a new compressor that will be especially beneficial for the GpsTime
field that is of type double
. This is what is hurting TileDB vs. LAZ currently, not the rest of the fields which compress pretty well with off-the-shelf compressors (like zstd and bzip2). To address this issue, the new compressor:
- Sorts on
GpsTime
within theGpsTime
andX
,Y
andZ
tiles (without impacting the rest of the attributes) - Computes and sorts the pairwise XORs of the sorted
GpsTime
values - Compresses the result with bzip2
In my local experiments, the above achieves massive compression for GpsTime
(~10x versus 2x we currently achieve with zstd). I believe that will get TileDB to be on par with LAZ in terms of data sizes.
The reason why we don't use laz-perf off-the-shelf is that TileDB is a columnar format (like Parquet) and stores the values of each field/attribute in separate files. If we coalesced the fields, then we would hinder the ability to rapidly subselect on a subset of the fields, so performance would be impacted significantly. I believe that the new compressor we are working on will achieve the desired compression ratio.
I'll keep you posted on progress on this issue. Thanks for reaching out!
Thanks for the response @stavrospapadopoulos! I'll keep me eyes on this repo for the next release. Sounds like some good improvements are coming!
Just noticed the floating scaling compressor in the latest release, 2.11. Any thoughts on if this would improve pointcloud storage/compression?
Hi @ryan-salo, it probably will for the case of X, Y and Z. Please stay tuned though, we are working on another compressor that will improve even further the pointcloud storage (specifically the GPSTime field). We'll experiment with all new compressors and select the best defaults in our PDAL ingestor.