PMTiles Compression (spec v3)

Options:

Gzip compression - requires a library like pako, may be expensive
Cap'n proto packing: https://capnproto.org/encoding.html
Protobuf varints: https://developers.google.com/protocol-buffers/docs/encoding

Use cases:

Dense tile pyramids
Sparse pyramids (tippecanoe output)

Apr 25 '22 06:04 bdon

add tile type as a required metadata field (png, jpg, mvt, etc)

Jun 14 '22 07:06 bdon

Flaws in current design:

512000 fixed-size header is wasteful
Index performs poorly for certain cases (panning at leaf level and leaf+level 1, especially)
Waste of ID space ZXY

New design:

All internal ID storage is based on a Hilbert Tile ID
Leaf directories are a configurable-size batching of the ID space (by default again 21845)
The first 21845 entries are top-level entries, recognizing that overview tiles are more frequently accessed
Leaf directories can be batched recursively: see FlatGeobuf https://worace.works/2022/03/12/flatgeobuf-implementers-guide/
Offsets in indexes should be relative to the start of the data section, allowing relocation
TileId, Offset, Length should be delta-encoded before gzip-compression.
A metadata flag clustered:true indicates that the tile order on disk matches TileId order
Mandate GZIP for vector tile content (ensure edge can re-encode to Brotli efficiently)

Unsolved problems:

Relocation problem with offset of directory IDs (directories store "leaf level" offset?)
Specific algorithm for clustering while also working around deduplication
Should indexes go at the end or the beginning?
How to store "directory" bit

Jun 16 '22 02:06 bdon

target metric: (total # of tiles in archive / size of index in bytes) = average number of bytes per tile entry Currently this is 17, 3-5 bytes per entry is what my experimental results are...can we do better?

Jun 16 '22 09:06 bdon

Parquet encodings: https://parquet.apache.org/docs/file-format/data-pages/encodings/

Jun 16 '22 13:06 bdon

extend spec to compress entire subtrees (ocean tiles) ?

Jun 17 '22 01:06 bdon

Move certain fields into header instead of metadata, to avoid blocking on large metadata
- bbox, minzoom, maxzoom, tile_type, compression, clustered

Jun 18 '22 15:06 bdon

Benchmark against Parquet size

Jun 20 '22 01:06 bdon

Consider if we should add leaders/trailers: https://gdal.org/drivers/raster/cog.html

the COG 16KB assumptions seem good

Jul 13 '22 13:07 bdon

Ghost sections / extensions, example: storing offset->hash in a ghost section to enable efficient diffing of two PMTiles archives

Aug 09 '22 23:08 bdon