PMTiles icon indicating copy to clipboard operation
PMTiles copied to clipboard

Compression (spec v3)

Open bdon opened this issue 3 years ago • 9 comments

Options:

  • Gzip compression - requires a library like pako, may be expensive
  • Cap'n proto packing: https://capnproto.org/encoding.html
  • Protobuf varints: https://developers.google.com/protocol-buffers/docs/encoding

Use cases:

  • Dense tile pyramids
  • Sparse pyramids (tippecanoe output)

bdon avatar Apr 25 '22 06:04 bdon

add tile type as a required metadata field (png, jpg, mvt, etc)

bdon avatar Jun 14 '22 07:06 bdon

Flaws in current design:

  • 512000 fixed-size header is wasteful
  • Index performs poorly for certain cases (panning at leaf level and leaf+level 1, especially)
  • Waste of ID space ZXY

New design:

  • All internal ID storage is based on a Hilbert Tile ID
  • Leaf directories are a configurable-size batching of the ID space (by default again 21845)
  • The first 21845 entries are top-level entries, recognizing that overview tiles are more frequently accessed
  • Leaf directories can be batched recursively: see FlatGeobuf https://worace.works/2022/03/12/flatgeobuf-implementers-guide/
  • Offsets in indexes should be relative to the start of the data section, allowing relocation
  • TileId, Offset, Length should be delta-encoded before gzip-compression.
  • A metadata flag clustered:true indicates that the tile order on disk matches TileId order
  • Mandate GZIP for vector tile content (ensure edge can re-encode to Brotli efficiently)

Unsolved problems:

  • Relocation problem with offset of directory IDs (directories store "leaf level" offset?)
  • Specific algorithm for clustering while also working around deduplication
  • Should indexes go at the end or the beginning?
  • How to store "directory" bit

bdon avatar Jun 16 '22 02:06 bdon

target metric: (total # of tiles in archive / size of index in bytes) = average number of bytes per tile entry Currently this is 17, 3-5 bytes per entry is what my experimental results are...can we do better?

bdon avatar Jun 16 '22 09:06 bdon

Parquet encodings: https://parquet.apache.org/docs/file-format/data-pages/encodings/

bdon avatar Jun 16 '22 13:06 bdon

  • extend spec to compress entire subtrees (ocean tiles) ?

bdon avatar Jun 17 '22 01:06 bdon

  • Move certain fields into header instead of metadata, to avoid blocking on large metadata
    • bbox, minzoom, maxzoom, tile_type, compression, clustered

bdon avatar Jun 18 '22 15:06 bdon

  • Benchmark against Parquet size

bdon avatar Jun 20 '22 01:06 bdon

Consider if we should add leaders/trailers: https://gdal.org/drivers/raster/cog.html

the COG 16KB assumptions seem good

bdon avatar Jul 13 '22 13:07 bdon

Ghost sections / extensions, example: storing offset->hash in a ghost section to enable efficient diffing of two PMTiles archives

bdon avatar Aug 09 '22 23:08 bdon