planetiler [FEATURE] PMTiles output format

Is your feature request related to a problem? Please describe.

Currently, Planetiler outputs mbtiles which requires a tileserver, i.e., static file hosting like on github pages is not enough to serve tiles.

Describe the solution you'd like

PMTiles can be used to serve vector tiles with range requests and they don't require a tile server. Planetiler could implement a pmtiles writer.

Describe alternatives you've considered

@bdon created a mbtiles to pmtiles converter:

pip install pmtiles
pmtiles-convert TILES.mbtiles TILES.pmtiles

Additional context the powerlines example uses pmtiles https://github.com/wipfli/powerlines-switzerland

Feb 27 '22 07:02 wipfli

I think this makes sense, probably migrate the --mbtiles=output.mbtiles option to --output=output.mbtiles or --output=output.pmtiles and switch the writer implementation based on file type. It sounds like the format is pretty straightforward, but Brandon pointed out that it would be beneficial to change the tiles order to collocate nearby tiles. He recommended hilbert curve order, but tile pyramid order might satisfy a similar goal and be a bit easier to implement?

Feb 27 '22 13:02 msbarry

Would it make sense to structure this as a separate Java library? If so, can that live alongside the python/js implementations at https://github.com/protomaps/PMTiles or should it live in its own Git repository?

Feb 28 '22 03:02 bdon

@bdon a separate library would be nice, then the wrapper in planetiler would be pretty minimal. At the simplest it would need an API like:

try (var pmtiles = new PMTiles(pathOrOutputStream)) {
  for (var tile : tiles) {
    pmtiles.writeTile(tile.x, tile.y, tile.z, tile.data);
  }
} // close() flushes the index leaves, or could have an explicit finalize() call like the go library

For performance optimizations it might make sense to expose the hashing function and an API to write a tile with a known hash as well so the writer could avoid hashing the same bytes over and over again, but we could just start with something simple and add that after profiling if necessary.

Feb 28 '22 10:02 msbarry

Also @bdon could you elaborate on the tile ordering optimization? Is the main reason to put nearby tiles into the same index leaves? Planetiler packs tile x/y/z coordinates into a 32 bit integer that defines the order in which tiles are emitted, so I'd have to express a different ordering strategy as a different mapping from x/y/z to int.

Feb 28 '22 13:02 msbarry

The tile ordering refers to the order of the tiles in the archive; as of spec v2 their entries in the index is strictly defined (ascending z/x/y). If tiles are in Hilbert order in the archive, they are guaranteed to be nearby in the file if they're nearby in 2D - this locality makes a big latency difference if you're serving from disk and the OS is paging, but usually doesn't make a difference for cloud storage (depends on how it's implemented).

Feb 28 '22 13:02 bdon

I also found that writing pmtiles output is substantially faster (3 minutes for the planet), so I'm going to include a first pass of this in my "reducing single-threaded bottlenecks" workstream. I'll write a pmtiles class with the goal of eventually extracting it to https://github.com/protomaps/PMTiles.

Apr 08 '22 10:04 msbarry

Talking with @bdon, the pmtiles format is going through a couple of iterations between now and August so let's wait to add native pmtiles output until after that solidifies more. @bdon feel free to ping this issue when you think the spec is in a stable state to build against.

Jun 10 '22 10:06 msbarry

@bdon would it be possible to use the go-pmtiles implementation here in Java in Planetiler?

Nov 22 '22 11:11 wipfli

I run Planetiler on the Shortbread configurable schema on the full planet. It created something like a 68 GB output file. Then I converted the .mbtiles to .pmtiles (25 minutes) and now I am uploading the file to R2 (roughly 30 minutes).

What would be amazing is if Planetiler could directly produce PMTiles and stream them to a S3-compatible storage provider...

Nov 22 '22 12:11 wipfli

@bdon would it be possible to use the go-pmtiles implementation here in Java in Planetiler?

The plan is to output the PMTiles v3 format directly in the java code. It's on my plate, but need to finish up the Tippecanoe output first :)

Nov 22 '22 14:11 bdon

This is cool to see. What is the status? Almost done?

Feb 25 '23 01:02 hallahan

Getting closer. #502 wraps up most of the internal bits but we will need to also expose this output format to configurations/command line for the next point release.

Feb 28 '23 02:02 bdon

I am really looking forward to #502.

Is it possible with rclone or something to upload the pmtiles file to an S3-compatible storage while planetiler is still writing?

Just to share my numbers: My custom planet mbtiles file is 45 GB and it takes 15 minutes to write for planetiler. Then it takes 18 minutes to convert the file from mbtiles to pmtiles, and then it takes 13 minutes to upload the pmtiles file to cloudflare.

Mar 08 '23 17:03 wipfli

The pmtiles writer is going to write the whole file sequentially, then when it finishes it will go back to the beginning and write the header and root directory. I'm not sure if that pattern would work integrating with a third-party upload tool?

Theoretically I think planetiler could do the upload directly using the S3 multi-part upload API - it would just write the first part last once it knows what the first header/root directory will look like.

At the very least, #502 should combine your first 2 steps into one step that takes less than 15 minutes.

Mar 08 '23 20:03 msbarry

It should be simple to run planetiler and rclone in sequence to perform the upload.

My thought for the next v4 spec of pmtiles (backwards compatible, don't worry) is to allow for the header and root directories to be at the end of the archive. This would make the entire format streamable, meaning planetiler could write to storage as it's assembling the tiles, saving time and local disk space.

To make this work however, we need to validate that every storage platform pmtiles v3 runs on supports end-addressing HTTP range requests correctly.

However, what @msbarry said about multipart uploads out-of-order would be even better and not require a spec revision. I'm not sure if that multipart behavior is consistent across storage platforms though.

Mar 09 '23 02:03 bdon

Resolved by #502

Mar 13 '23 18:03 msbarry

Amazing, I need to try it out. Thanks @bdon for writing it and thanks @msbarry for the review!

Mar 13 '23 18:03 wipfli

Thanks a lot for this. Can we use the command line to generate PMTILES ?

Mar 15 '23 15:03 laurentdiazfr

Almost... I'm working on a change now so you can say --output=result.pmtiles to use the new functionality. Should be ready in a day or two.

Mar 15 '23 21:03 msbarry

I did a comparison between the direct pmtiles writer in planetiler and the mbtiles writer + conversion afterwards to pmtiles. I did a planet run with my custom map tileset https://github.com/wipfli/swiss-map. Here is the result:

New pmtiles writer total duration: 9146 seconds
Previous mbtiles writer followed by conversion to pmtiles: 9171 seconds

This is on a 12 core, 128 GB machine. The logs are available here: https://gist.github.com/wipfli/17bb8ad8d123f7d93313417dc7d4fac5

It is surprising that the new pmtiles writer does not outperform the old way significantly. Did I somehow mess up some settings?

Mar 21 '23 16:03 wipfli

Archive writing is the only part that gets faster with pmtiles.

Here's what I see for pmtiles:

2:32:17 INF - 	archive   1h14m23s cpu:13h38m25s gc:3m53s avg:11
2:32:17 INF - 	  read    1x(8% 5m43s sys:55s wait:1h4m35s done:9s)
2:32:17 INF - 	  encode 11x(94% 1h9m36s sys:5s wait:10s done:9s)
2:32:17 INF - 	  write   1x(4% 2m50s sys:1m25s wait:1h10m42s) <<<<<<<<<<<---------- pmtiles

and for mbtiles:

2:32:41 INF - 	archive   1h15m8s cpu:13h47m35s gc:4m37s avg:11
2:32:41 INF - 	  read    1x(8% 6m7s sys:1m5s wait:1h4m1s)
2:32:41 INF - 	  encode 11x(92% 1h9m25s sys:7s wait:36s)
2:32:41 INF - 	  write   1x(11% 8m sys:1m17s wait:1h3m14s) <<<<<<<<<<<<--------------- mbtiles

the archive time is dominated by encode since you only have 12 cores - if you run on a machine with 64-100+ cores then encode starts to take less time and write dominates.

Mar 21 '23 18:03 msbarry

Nice thanks!

Mar 21 '23 19:03 wipfli

I had a bug in my script: I created swissmap.mbtiles with planetiler but then used pmtiles convert output.mbtiles output.pmtiles, and it turns out that output.mbtiles was a 155 MB file while swissmap.mbtiles is 35 GB...

I then checked how long pmtiles convert swissmap.mbtiles swissmap.pmtiles takes and it turns out that this is 22 minutes.

So the result in my custom planet tile set is:

old method (output mbtiles and then convert): 9171 s + 22 min = 10491 s
new method (direct pmtiles output): 9146 s

So the new method is almost 13 percent faster!

Mar 22 '23 10:03 wipfli

Hi,one question: I am running this command to generate pmtiles:

sudo java -Xmx1g -jar planetiler.jar --download --area=monaco --output=monaco.pmtiles

However,no monaco.pmtiles file is being created inside data folder,only an output.mbtiles file.

What am I missing?

Thanks

Mar 30 '23 14:03 sing78

It creates output.pmtiles in the current folder, no?

Mar 30 '23 19:03 wipfli

Oh you're probably using the latest release jar and I haven't made a release for a while. I should do a release and get that up to date!

Mar 30 '23 23:03 msbarry

planetiler planetiler copied to clipboard

[FEATURE] PMTiles output format

planetiler
planetiler copied to clipboard