planetiler
planetiler copied to clipboard
[FEATURE] PMTiles output format
Is your feature request related to a problem? Please describe.
Currently, Planetiler outputs mbtiles which requires a tileserver, i.e., static file hosting like on github pages is not enough to serve tiles.
Describe the solution you'd like
PMTiles can be used to serve vector tiles with range requests and they don't require a tile server. Planetiler could implement a pmtiles writer.
Describe alternatives you've considered
@bdon created a mbtiles to pmtiles converter:
pip install pmtiles
pmtiles-convert TILES.mbtiles TILES.pmtiles
Additional context the powerlines example uses pmtiles https://github.com/wipfli/powerlines-switzerland
I think this makes sense, probably migrate the --mbtiles=output.mbtiles
option to --output=output.mbtiles
or --output=output.pmtiles
and switch the writer implementation based on file type. It sounds like the format is pretty straightforward, but Brandon pointed out that it would be beneficial to change the tiles order to collocate nearby tiles. He recommended hilbert curve order, but tile pyramid order might satisfy a similar goal and be a bit easier to implement?
Would it make sense to structure this as a separate Java library? If so, can that live alongside the python/js implementations at https://github.com/protomaps/PMTiles or should it live in its own Git repository?
@bdon a separate library would be nice, then the wrapper in planetiler would be pretty minimal. At the simplest it would need an API like:
try (var pmtiles = new PMTiles(pathOrOutputStream)) {
for (var tile : tiles) {
pmtiles.writeTile(tile.x, tile.y, tile.z, tile.data);
}
} // close() flushes the index leaves, or could have an explicit finalize() call like the go library
For performance optimizations it might make sense to expose the hashing function and an API to write a tile with a known hash as well so the writer could avoid hashing the same bytes over and over again, but we could just start with something simple and add that after profiling if necessary.
Also @bdon could you elaborate on the tile ordering optimization? Is the main reason to put nearby tiles into the same index leaves? Planetiler packs tile x/y/z coordinates into a 32 bit integer that defines the order in which tiles are emitted, so I'd have to express a different ordering strategy as a different mapping from x/y/z to int.
The tile ordering refers to the order of the tiles in the archive; as of spec v2 their entries in the index is strictly defined (ascending z/x/y). If tiles are in Hilbert order in the archive, they are guaranteed to be nearby in the file if they're nearby in 2D - this locality makes a big latency difference if you're serving from disk and the OS is paging, but usually doesn't make a difference for cloud storage (depends on how it's implemented).
I also found that writing pmtiles output is substantially faster (3 minutes for the planet), so I'm going to include a first pass of this in my "reducing single-threaded bottlenecks" workstream. I'll write a pmtiles class with the goal of eventually extracting it to https://github.com/protomaps/PMTiles.
Talking with @bdon, the pmtiles format is going through a couple of iterations between now and August so let's wait to add native pmtiles output until after that solidifies more. @bdon feel free to ping this issue when you think the spec is in a stable state to build against.
@bdon would it be possible to use the go-pmtiles implementation here in Java in Planetiler?
I run Planetiler on the Shortbread configurable schema on the full planet. It created something like a 68 GB output file. Then I converted the .mbtiles to .pmtiles (25 minutes) and now I am uploading the file to R2 (roughly 30 minutes).
What would be amazing is if Planetiler could directly produce PMTiles and stream them to a S3-compatible storage provider...
@bdon would it be possible to use the go-pmtiles implementation here in Java in Planetiler?
The plan is to output the PMTiles v3 format directly in the java code. It's on my plate, but need to finish up the Tippecanoe output first :)
This is cool to see. What is the status? Almost done?
Getting closer. #502 wraps up most of the internal bits but we will need to also expose this output format to configurations/command line for the next point release.
I am really looking forward to #502.
Is it possible with rclone
or something to upload the pmtiles file to an S3-compatible storage while planetiler is still writing?
Just to share my numbers: My custom planet mbtiles file is 45 GB and it takes 15 minutes to write for planetiler. Then it takes 18 minutes to convert the file from mbtiles
to pmtiles
, and then it takes 13 minutes to upload the pmtiles file to cloudflare.
The pmtiles writer is going to write the whole file sequentially, then when it finishes it will go back to the beginning and write the header and root directory. I'm not sure if that pattern would work integrating with a third-party upload tool?
Theoretically I think planetiler could do the upload directly using the S3 multi-part upload API - it would just write the first part last once it knows what the first header/root directory will look like.
At the very least, #502 should combine your first 2 steps into one step that takes less than 15 minutes.
It should be simple to run planetiler and rclone in sequence to perform the upload.
My thought for the next v4 spec of pmtiles (backwards compatible, don't worry) is to allow for the header and root directories to be at the end of the archive. This would make the entire format streamable, meaning planetiler could write to storage as it's assembling the tiles, saving time and local disk space.
To make this work however, we need to validate that every storage platform pmtiles v3 runs on supports end-addressing HTTP range requests correctly.
However, what @msbarry said about multipart uploads out-of-order would be even better and not require a spec revision. I'm not sure if that multipart behavior is consistent across storage platforms though.
Resolved by #502
Amazing, I need to try it out. Thanks @bdon for writing it and thanks @msbarry for the review!
Thanks a lot for this. Can we use the command line to generate PMTILES ?
Almost... I'm working on a change now so you can say --output=result.pmtiles
to use the new functionality. Should be ready in a day or two.
I did a comparison between the direct pmtiles writer in planetiler and the mbtiles writer + conversion afterwards to pmtiles. I did a planet run with my custom map tileset https://github.com/wipfli/swiss-map. Here is the result:
- New pmtiles writer total duration: 9146 seconds
- Previous mbtiles writer followed by conversion to pmtiles: 9171 seconds
This is on a 12 core, 128 GB machine. The logs are available here: https://gist.github.com/wipfli/17bb8ad8d123f7d93313417dc7d4fac5
It is surprising that the new pmtiles writer does not outperform the old way significantly. Did I somehow mess up some settings?
Archive writing is the only part that gets faster with pmtiles.
Here's what I see for pmtiles:
2:32:17 INF - archive 1h14m23s cpu:13h38m25s gc:3m53s avg:11
2:32:17 INF - read 1x(8% 5m43s sys:55s wait:1h4m35s done:9s)
2:32:17 INF - encode 11x(94% 1h9m36s sys:5s wait:10s done:9s)
2:32:17 INF - write 1x(4% 2m50s sys:1m25s wait:1h10m42s) <<<<<<<<<<<---------- pmtiles
and for mbtiles:
2:32:41 INF - archive 1h15m8s cpu:13h47m35s gc:4m37s avg:11
2:32:41 INF - read 1x(8% 6m7s sys:1m5s wait:1h4m1s)
2:32:41 INF - encode 11x(92% 1h9m25s sys:7s wait:36s)
2:32:41 INF - write 1x(11% 8m sys:1m17s wait:1h3m14s) <<<<<<<<<<<<--------------- mbtiles
the archive time is dominated by encode
since you only have 12 cores - if you run on a machine with 64-100+ cores then encode starts to take less time and write
dominates.
Nice thanks!
I had a bug in my script: I created swissmap.mbtiles
with planetiler but then used pmtiles convert output.mbtiles output.pmtiles
, and it turns out that output.mbtiles
was a 155 MB file while swissmap.mbtiles
is 35 GB...
I then checked how long pmtiles convert swissmap.mbtiles swissmap.pmtiles
takes and it turns out that this is 22 minutes.
So the result in my custom planet tile set is:
- old method (output mbtiles and then convert): 9171 s + 22 min = 10491 s
- new method (direct pmtiles output): 9146 s
So the new method is almost 13 percent faster!
Hi,one question: I am running this command to generate pmtiles:
sudo java -Xmx1g -jar planetiler.jar --download --area=monaco --output=monaco.pmtiles
However,no monaco.pmtiles file is being created inside data folder,only an output.mbtiles file.
What am I missing?
Thanks
It creates output.pmtiles in the current folder, no?
Oh you're probably using the latest release jar and I haven't made a release for a while. I should do a release and get that up to date!