nominatim-k8s icon indicating copy to clipboard operation
nominatim-k8s copied to clipboard

Persist database in docker image

Open hallo02 opened this issue 6 years ago • 1 comments

Hey there

What would be the drawbacks if the desired pbf file would be processed and loaded into postgres during the image build? Therefore, the docker-entrypoint.sh wouldn't distinguish between a "CREATE" and "RESTORE" mode, it would just start postgresql. The origin approach of restoring data using GKE would be needlessly.

Obviously the image size could be enormous.

I tried the approach successfully with https://download.geofabrik.de/europe/liechtenstein-latest.osm.pbf (2.2MB). docker.entrypoint.sh got reduced basically to

#!/bin/bash
# Start PostgreSQL
service postgresql start

# Tail Apache logs
# tail -f /var/log/apache2/* &

# Run Apache in the foreground
/usr/sbin/apache2ctl -D FOREGROUND

The Dockerfile got expanded, like

# Import pbf and import database
ARG NOMINATIM_PBF_URL

RUN NOMINATIM_DATA_PATH=${NOMINATIM_DATA_PATH:="/srv/nominatim/data"} \
    && NOMINATIM_DATA_LABEL=${NOMINATIM_DATA_LABEL:="data"} \
    && NOMINATIM_PBF_URL=${NOMINATIM_PBF_URL:="http://download.geofabrik.de/europe/switzerland-latest.osm.pbf"} \
    && NOMINATIM_POSTGRESQL_DATA_PATH=${NOMINATIM_POSTGRESQL_DATA_PATH:="/var/lib/postgresql/9.3/main"} \
    && curl -L $NOMINATIM_PBF_URL --create-dirs -o $NOMINATIM_DATA_PATH/$NOMINATIM_DATA_LABEL.osm.pbf \
    && chmod 755 $NOMINATIM_DATA_PATH \
    && service postgresql start \
    && sudo -u postgres psql postgres -tAc "SELECT 1 FROM pg_roles WHERE rolname='nominatim'" | grep -q 1 || sudo -u postgres createuser -s nominatim \
    && sudo -u postgres psql postgres -tAc "SELECT 1 FROM pg_roles WHERE rolname='www-data'" | grep -q 1 || sudo -u postgres createuser -SDR www-data \
    && sudo -u postgres psql postgres -c "DROP DATABASE IF EXISTS nominatim" \
    && useradd -m -p password1234 nominatim \
    && sudo -u nominatim /srv/nominatim/build/utils/setup.php --osm-file $NOMINATIM_DATA_PATH/$NOMINATIM_DATA_LABEL.osm.pbf --all --threads 2

For a 2nd approach I took https://download.geofabrik.de/europe/switzerland-latest.osm.pbf (295MB). The image creation took around 90minutes. But I faced the following problem: https://github.com/moby/moby/issues/22610. It seems the host space got filled up during container startup until kubernetes killed the pod. I will provide more information about the reason the next days. It is not about data from within the container. Maybe, like the issue points out, it is about accumulated container logs.

Thank you for your thoughts.

hallo02 avatar Apr 30 '19 17:04 hallo02

Hi @hallo02

What would be the drawbacks if the desired pbf file would be processed and loaded into postgres during the image build?

I guess the main drawbacks are that you need to create a new image every time you want to update and the size of the image would be enormous, as you pointed out. However, if it works for your use case go for it.

What might help in your case is to load into Postgres during the as builder image and then copy the Postgres data files to the final image. This might leave behind a lot of accumulated logs and dependencies that you don't need when the container is being run.

peter-evans avatar May 01 '19 08:05 peter-evans