planet-dump-ng
planet-dump-ng copied to clipboard
Planet state file
It's a pain to figure out which replication state.txt
file corresponds to a given planet and, now that all the current & history for both XML and PBF correspond to the same state, it would make it easier if the dump process or script would figure out the state from which replication could continue.
The dump already tracks the last timestamp in the file, and this can be used to find a state file. But there might be in-progress transactions at that point, so it will be necessary to track backwards in the state files until before all those transactions start.
Hi, Sorry for hijacking your thread, but it's kind of related.
For the life of me if I can figure this out.
As a background, to understand my approach, we're building a private OSM server and a tiles server, part of a bigger app. With the intention of having a small DB and new clients as up-to-date as possible with the main OSM server, we decided to import only the client's bits, not the whole country. When a new client joins, we re-download the country.pbf, slice his bit and import it without impacting our other clients (I think). Clients will edit the map using iD so I've setup a tiles server in sync. Replication using osmdbt is done and I'm currently working on importing the chages using imposm or osm2pgsql and here is the tricky bit.
I can't manage to generate the correct state.txt. Using osmium fileinfo on the generated PBF shows the latest change's timestamp, but no sequenceNumber.
To summarise, when a new client joins:
- download "country.pbf", slice his turf using a poly and import it using osmosis
- dump the updated database to a PBF using your tool (kudos, nice work)
- drop postgis and mapnik tiles and re-import with imposm or osm2pgsql
I'd appreciate some help, thanks :)
The planet-dump-ng
software only sets the current time in the PBF header, not the sequence number. This is because the planet dump is an independent process from the replication diffs and neither depends on the other. Also, there are minutely, hourly and daily replication streams and each has a different (independent) sequence number.
There are tools to synchronise a planet dump with a chosen replication stream, for example pyosmium's up-to-date tool. This works by looking at the timestamp of the planet file, rewinding a bit and replaying the diffs covering that period.
The general reason why these streams are all independent is that it previously wasn't easy to identify a linear point in time in Postgres, hence all the stuff in Osmosis' state file about txnActiveList
and the xid
column index in the database. More recently, Postgres made it easier to get access to the internals of the replication log, which made more robust tools like osmdbt
possible and allows talking about a specific linear point in the log.
In summary; planet-dump-ng
won't write the sequence number header in PBF files, you'll have to use something else (e.g: pyosmium-up-to-date
) to merge replication stream info into the planet file.
Hope that helps!
it does, thanks for the explanation.
Edit: technically, I'd rather reset the sequence numbers every time
Hi,
I wanted to contribute to your project so I'm pasting our dockerfiles here. Maybe you guys need it. Postgres version can be bumped to 12 without any hiccups, we're just not there and haven't tested it.
I removed line, might hiccup at permissions on the volume but I don't think so.
Dockerfile:
FROM debian:buster-slim
ARG PLANET_DUMP_URL=https://github.com/zerebubuth/planet-dump-ng/archive/v1.2.0.tar.gz
RUN set -eu; \
apt-get update; \
apt-get install -y --no-install-recommends \
build-essential \
autoconf \
automake \
ca-certificates \
curl \
libboost-date-time-dev \
libboost-dev \
libboost-filesystem-dev \
libboost-iostreams-dev \
libboost-program-options-dev \
libboost-thread-dev \
libosmpbf-dev \
libprotobuf-dev \
libxml2-dev \
osmpbf-bin \
pkg-config \
postgresql-client-11; \
useradd -u 999 -r planetdump; \
mkdir /opt/build; \
curl -sL $PLANET_DUMP_URL | tar xz -C /opt/build --strip-components=1; \
cd /opt/build; \
./autogen.sh; \
./configure; \
make -j $(nproc); \
make install; \
cd /; \
rm -rf /opt/build; \
mkdir /dumps; \
chown planetdump:planetdump /dumps
COPY entrypoint /usr/local/bin/entrypoint
VOLUME /dumps
USER planetdump
WORKDIR /dumps
ENTRYPOINT ["/usr/local/bin/entrypoint"]
CMD ["bash"]
entrypoint (chmod +x)
#!/bin/sh
set -eu
PBF_FILE=${PBF_FILE:-latest.pbf}
case "$1" in
dump)
cd /dumps
rm -rf users changeset* node* way* relation*
echo "dumping OSM db"
DUMP_FILE=$(mktemp)
pg_dump -F custom > $DUMP_FILE
echo "creating PBF"
planet-dump-ng -f $DUMP_FILE -p "$PBF_FILE"
rm -rf users changeset* node* way* relation*
;;
*) exec "$@";;
esac
Edit: added missing file name, removed osmium.