openmaptiles icon indicating copy to clipboard operation
openmaptiles copied to clipboard

Kubernetes Helm deployment for openmaptiles

Open mfuxi opened this issue 4 years ago • 7 comments

This is a feature request:

Create a Helm chart for openmaptiles application which with it, we would easily deploy the app on top of Kubernetes

mfuxi avatar Apr 05 '20 13:04 mfuxi

Hi. I'm gonna take a look into this.

robjuz avatar Jul 21 '21 14:07 robjuz

A Helm chart for openmaptiles is I've been working on for a while now. I plan to submit a PR when I finalize my work within the next month or so. But I'll share some of my thoughts and results having worked on this. Happy to answer any questions.

Openmaptiles requires several processes to run sequentially (start postgres, import osm, import wikidata, import sql ... generate tiles ...). Kubernetes does not natively have a way to define running Jobs sequentially. Instead, we've had to split up the process into several manual steps.

Step 1

I start a Postgres database on a GCP VM using these instructions. For the full planet I use a n1-highmem-32 machine.

Step 2

I created a Helm chart that downloads the extract (or planet) and runs the imports. I use several initContainers as a hack to run these steps sequentially. The container runs the final step. For the full planet I used a Kubernetes cluster with a single n1-highmem-32 node. It took imposm about 13 hours to read and write the planet to postgres. The import-sql step took just shy of 24 hours to complete.

Step 3

After step 2 completes, I create a snapshot of the GCP disk that stores the openmaptiles database. Then I create an image from that snapshot and use the image to start up one or more large GCP VMs using a Managed Instance Group. This way I have several VMs running the same Postgres database. I used four n1-highmem-32 instances for tile generation.

Step 4

I created another Helm chart for tile generation based on the generate-tiles bash script. For very large extracts and planet, I modified the script to only generate medium level zooms. Then I use this process for generating tiles for higher level zooms. After all tiles are generated I generate the metadata for the mbtiles database. I used a Kubernetes cluster with a single n1-highmem-8 node. Tile generation for zoom 0 -14 took 14h22m. I think it might be slightly faster with more CPUs on the Postgres VMs.

I think it might be possible for steps 1 and 3 to use Kubernetes rather than VMs using something like CrunchyData's Kubernetes postgres operator.

I'd also expect something like Airflow could run the jobs sequentially rather than using initContainers and manual steps.

nickpeihl avatar Aug 12 '21 23:08 nickpeihl

@nickpeihl For step 1 and 3 you can use a bitnami/postgresql chart that also supports replications.

In my opinion using initContainers for running sequential tasks is fine.

You could combine step 2 and 4 charts in one chart and run the jobs dependent on a variable in values.yaml

I'm doing something similar in my nominatim chart. I have a job to run the import, a deployment for the nominatim and another deployment for keeping the database up to date. With import.enabled: true my deployments is not created. I also set some extra postgresql params during the import.


I was unable to understand the makefile and create the steps. Would love to see you charts!

robjuz avatar Aug 14 '21 15:08 robjuz

@robjuz TBH, I'm still very new to Helm/K8s. So any help you can provide is much appreciated. Here is the job spec I've created for importing OSM data to mimic the makefile steps.

Right now, we use an postgres instance external to Kubernetes. I tried using a postgres chart as you suggested, but I haven't had much luck.

The PGHOST, PGPORT, PGDATABASE and PGUSER, PGPASS environment variables are defined in pg-conn configmap and pg-auth secret respectively.

...
      initContainers:
        - name: debug # 1
          #
          # Debug step -- will pause as long as importer-config configmap's DEBUG is non-empty
          #
          image: {{ .Values.openmaptiles.tools.image }}
          imagePullPolicy: {{ .Values.openmaptiles.tools.imagePullPolicy }}
          command: ["/bin/sh", "-c", 'bash -c "${STARTUP_SCRIPT}"']
          env:
            - name: STARTUP_SCRIPT
              value: |
                #!/usr/bin/env bash
                echo "/etc/importer-config/DEBUG is '$(cat /etc/importer-config/DEBUG)', will pause on non-empty"
                while [[ -s /etc/importer-config/DEBUG ]]; do
                  sleep 1
                done
                echo "Starting importer init steps..."
          envFrom:
            - configMapRef:
                name: pg-conn
            - secretRef:
                name: pg-auth
          volumeMounts:
            - name: importer-config
              mountPath: /etc/importer-config
              readOnly: true
            - name: data
              mountPath: "/data"

        - name: initialize # 2
          #
          # Download OpenMapTiles git repo
          # Generate mapping files and SQL scripts
          # Download OSM data (long step)
          # Import borders
          # Import OSM data (very long step)
          # Import Wikidata labels
          #
          image: {{ .Values.openmaptiles.tools.image }}
          imagePullPolicy: {{ .Values.openmaptiles.tools.imagePullPolicy }}
          command: ["/bin/sh", "-c", 'bash -c "${STARTUP_SCRIPT}"']
          env:
            - name: OMT_REPO
              value: {{ .Values.openmaptiles.repo }}
            - name: OMT_REVISION
              value: {{ .Values.openmaptiles.revision }}
            - name: PBF_AREA
              valueFrom:
                configMapKeyRef:
                  name: pbf-source
                  key: PBF_AREA
            - name: STARTUP_SCRIPT
              value: |
                #!/usr/bin/env bash
                set -euo pipefail
                echo "Running initialize stage @ $(date +'%Y-%m-%d %H:%M:%S.%3N %Z')"
                mkdir -p /data/state
                export DIFF_MODE=true
                export SRC_DIR=/data/src                             && mkdir -p "$SRC_DIR"
                export PBF_DATA_DIR=/data/pbf-download               && mkdir -p "$PBF_DATA_DIR"
                export IMPOSM_CONFIG_FILE="$PBF_DATA_DIR/imposm-config.yaml"
                export SQL_DIR=/data/build/sql
                export IMPOSM_MAPPING_FILE=/data/build/mapping.yaml
                export IMPOSM_CACHE_DIR=/data/imposm/cache
                export IMPOSM_DIFF_DIR=/data/imposm/diff             && mkdir -p "$IMPOSM_DIFF_DIR"
                export WIKIDATA_CACHE_FILE=/data/wikidata/cache.json
                # At this point always make sure Postgres is up using OMT user account
                echo "Wait for Postgres to start..."
                pgwait
                STATE=/data/state/initialize.lock
                if [[ ! -f "$STATE" ]]; then
                  echo "Processing for state $STATE @ $(date +'%Y-%m-%d %H:%M:%S.%3N %Z')"
                  rm -rf "$SRC_DIR"
                  mkdir -p "$(dirname "$SRC_DIR")"
                  git clone "$OMT_REPO" "$SRC_DIR"
                  cd "$SRC_DIR"
                  git checkout "$OMT_REVISION"
                  rm -rf "$IMPOSM_MAPPING_FILE"
                  mkdir -p "$(dirname "$IMPOSM_MAPPING_FILE")"
                  generate-imposm3 "$SRC_DIR/openmaptiles.yaml" > "$IMPOSM_MAPPING_FILE"
                  rm -rf "$SQL_DIR"
                  mkdir -p "$(dirname "$SQL_DIR")"
                  generate-sql "$SRC_DIR/openmaptiles.yaml" --dir "$SQL_DIR"
                  generate-sqltomvt "$SRC_DIR/openmaptiles.yaml" --key --gzip --function --fname=getmvt >> "$SQL_DIR/run_last.sql"
                  touch "$STATE"
                else
                  echo "Step $STATE is already done, skipping"
                fi
                STATE=/data/state/download-pbf.lock
                if [[ ! -f "$STATE" ]]; then
                  echo "Processing for state $STATE @ $(date +'%Y-%m-%d %H:%M:%S.%3N %Z')"
                  echo "Downloading $PBF_AREA into $PBF_DATA_DIR"
                  # do not quote, the variable may expand into two params
                  download-osm $PBF_AREA --imposm-cfg "$IMPOSM_CONFIG_FILE" -- -d "$PBF_DATA_DIR"
                  touch "$STATE"
                else
                  echo "Step $STATE is already done, skipping"
                fi
                STATE=/data/state/import-borders.lock
                if [[ ! -f "$STATE" ]]; then
                  echo "Processing for state $STATE @ $(date +'%Y-%m-%d %H:%M:%S.%3N %Z')"
                  import-borders
                  touch "$STATE"
                else
                  echo "Step $STATE is already done, skipping"
                fi
                STATE=/data/state/import-osm.lock
                if [[ ! -f "$STATE" ]]; then
                  echo "Processing for state $STATE @ $(date +'%Y-%m-%d %H:%M:%S.%3N %Z')"
                  rm -rf "$IMPOSM_CACHE_DIR"
                  mkdir -p "$IMPOSM_CACHE_DIR"
                  import-osm
                  touch "$STATE"
                else
                   echo "Step $STATE is already done, skipping"
                fi
                STATE=/data/state/import-wikidata.lock
                if [[ ! -f "$STATE" ]]; then
                  echo "Processing for state $STATE @ $(date +'%Y-%m-%d %H:%M:%S.%3N %Z')"
                  # do not delete existing cache in case it already exists
                  mkdir -p "$(dirname "$WIKIDATA_CACHE_FILE")"
                  import-wikidata --cache "$WIKIDATA_CACHE_FILE" "$SRC_DIR/openmaptiles.yaml"
                  touch "$STATE"
                else
                  echo "Step $STATE is already done, skipping"
                fi
                echo "------------------ @ $(date +'%Y-%m-%d %H:%M:%S.%3N %Z') --------------------"
          envFrom:
            - configMapRef:
                name: pg-conn
            - secretRef:
                name: pg-auth

          volumeMounts:
            - name: data
              mountPath: "/data"

        - name: import-data # 3
          #
          # Import Natural Earth, lake centerline, water polygons,
          # and any other data from import-data image
          #
          image: {{ .Values.openmaptiles.importData.image }}
          imagePullPolicy:
            {{ .Values.openmaptiles.importData.imagePullPolicy }}
          command: ["/bin/sh", "-c", 'sh -c "${STARTUP_SCRIPT}"']
          env:
            - name: STARTUP_SCRIPT
              value: |
                #!/usr/bin/env sh
                set -eu
                echo "Running import-data stage @ $(date +'%Y-%m-%d %H:%M:%S.%3N %Z')"
                STATE=/data/state/import-data.lock
                if [ ! -f "$STATE" ]; then
                  echo "Processing for state $STATE @ $(date +'%Y-%m-%d %H:%M:%S.%3N %Z')"
                  pwd
                  ls -lsh
                  ./import_data.sh
                  touch "$STATE"
                else
                  echo "Step $STATE is already done, skipping"
                fi
                echo "------------------ @ $(date +'%Y-%m-%d %H:%M:%S.%3N %Z') --------------------"
          envFrom:
            - configMapRef:
                name: pg-conn
            - secretRef:
                name: pg-auth
          volumeMounts:
            - name: data
              mountPath: "/data"

      containers:
        - name: finalize # 4
          #
          # Create/run all needed SQL code
          # Analyze DB to improve performance
          #
          image: {{ .Values.openmaptiles.tools.image }}
          imagePullPolicy: {{ .Values.openmaptiles.tools.imagePullPolicy }}
          command: ["/bin/sh", "-c", 'bash -c "${STARTUP_SCRIPT}"']
          env:
            - name: STARTUP_SCRIPT
              value: |
                #!/usr/bin/env bash
                set -euo pipefail
                echo "Running finalize stage @ $(date +'%Y-%m-%d %H:%M:%S.%3N %Z')"
                export SQL_DIR=/data/build/sql
                export PSQL_OPTIONS='-a -A'
                STATE=/data/state/import-sql.lock
                if [[ ! -f "$STATE" ]]; then
                  echo "Processing for state $STATE @ $(date +'%Y-%m-%d %H:%M:%S.%3N %Z')"
                  import-sql
                  touch "$STATE"
                else
                  echo "Step $STATE is already done, skipping"
                fi
                STATE=/data/state/analyze.lock
                if [[ ! -f "$STATE" ]]; then
                  echo "Processing for state $STATE @ $(date +'%Y-%m-%d %H:%M:%S.%3N %Z')"
                  psql -v ON_ERROR_STOP="1" \
                       -P pager=off \
                       -c '\timing on' \
                       -c "ANALYZE VERBOSE"
                  touch "$STATE"
                else
                  echo "Step $STATE is already done, skipping"
                fi
                echo "------------------ @ $(date +'%Y-%m-%d %H:%M:%S.%3N %Z') --------------------"
                echo "Import complete!"
          envFrom:
            - configMapRef:
                name: pg-conn
            - secretRef:
                name: pg-auth
          volumeMounts:
            - name: data
              mountPath: "/data"
      volumes:
        - name: importer-config
          configMap:
            name: pbf-source
            items:
              - key: DEBUG
                path: DEBUG
        - name: data
          persistentVolumeClaim:
            claimName: pvc--{{ .Values.diskNamePrefix }}--{{ .Values.diskName }}
...

As I said, I currently have the tile copier in a separate chart to be deployed separately after the import is done. Here is the job spec for that.

Most of the configs and secrets are the same as the import chart. The PGHOSTS_LIST variable is defined in the configmap as PGHOSTS_LIST: {{ .Values.postgres.hosts | join "&host=" }}.

containers:
      - name: openmaptiles-copier
        image: {{ .Values.openmaptiles.tools.image }}
        imagePullPolicy: {{ .Values.openmaptiles.tools.imagePullPolicy }}
        # Hack
        # command: [ "/bin/bash", "-c", "--" ]
        # args: [ "while true; do sleep 30; done;" ]
        command: ["/bin/sh", "-c", 'bash -c "${COPY_SCRIPT}"']
        env:
          - name: OMT_REPO
            value: {{ .Values.openmaptiles.tools.repo }}
          - name: OMT_REVISION
            value: {{ .Values.openmaptiles.tools.revision }}
          - name: COPY_SCRIPT
            value: |
              #!/bin/bash
              set -o errexit
              set -o pipefail
              set -o nounset
              # For backward compatibility, allow both PG* and POSTGRES_* forms,
              # with the non-standard POSTGRES_* form taking precedence.
              # An error will be raised if neither form is given, except for the PGPORT
              export PGDATABASE="${POSTGRES_DB:-${PGDATABASE?}}"
              export PGUSER="${POSTGRES_USER:-${PGUSER?}}"
              export PGPASSWORD="${POSTGRES_PASSWORD:-${PGPASSWORD?}}"
              export PGPORT="${POSTGRES_PORT:-${PGPORT:-5432}}"
              # List of postgres servers
              # "xxx.xxx.xxx.xxx&host=xxx.xxx.xxx.xxx&host=..."
              if [[ -z "${PGHOSTS_LIST}" ]]
              then
                export HOST_COUNT=1
                export PGHOSTS="${POSTGRES_HOST:-${PGHOST?}}"
              else
                export HOST_COUNT=`awk -F"&" '{print NF}' <<< "${PGHOSTS_LIST}"`
                export PGHOSTS="${PGHOSTS_LIST}"
              fi
              if [[ -n "${NO_GZIP:-}" ]] && [[ "${NO_GZIP:-}" != "0" ]] && [[ "${NO_GZIP:-}" != "false" ]]; then
                export NO_GZIP="&nogzip=1"
              else
                export NO_GZIP=
              fi
              if [[ -n "${USE_KEY_COLUMN:-}" ]] && [[ "${USE_KEY_COLUMN:-}" != "0" ]] && [[ "${USE_KEY_COLUMN:-}" != "false" ]]; then
                export USE_KEY_COLUMN="&key=1"
              else
                export USE_KEY_COLUMN=
              fi
              if [[ -n "${TEST_ON_STARTUP:-}" ]] && [[ "${TEST_ON_STARTUP:-}" != "0" ]] && [[ "${TEST_ON_STARTUP:-}" != "false" ]]; then
                export TEST_ON_STARTUP_TILE="&testOnStartup=${TEST_ON_STARTUP}"
              else
                export TEST_ON_STARTUP_TILE=
              fi
              export FUNC_ZXY=${FUNC_ZXY:-getmvt}
              export COPY_CONCURRENCY=${COPY_CONCURRENCY:-1}  # number of CPUs per postgres server
              export MAX_HOST_CONNECTIONS=${MAX_HOST_CONNECTIONS:-${COPY_CONCURRENCY}}
              export ALL_STREAMS=$(( MAX_HOST_CONNECTIONS * HOST_COUNT ))
              export EXPORT_DIR=${EXPORT_DIR:-/export}
              export MBTILES_FILE=${EXPORT_DIR}/${MBTILES_FILE:-tiles.mbtiles}
              export RETRY=${RETRY:-2}
              export BBOX=${BBOX:-"-180.0,-85.0511,180.0,85.0511"}
              export TIMEOUT=${TIMEOUT:-1800000}
              export MIN_ZOOM=${MIN_ZOOM:-0}
              export MID_ZOOM=${MID_ZOOM:-9}
              export MAX_ZOOM=${MAX_ZOOM:-14}
              export SRC_DIR=${SRC_DIR:-/src}
              rm -rf "$SRC_DIR"
              mkdir -p "$(dirname "$SRC_DIR")"
              git clone "$OMT_REPO" "$SRC_DIR"
              cd "$SRC_DIR"
              git checkout "$OMT_REVISION"
              export PGQUERY="pgquery://?database=${PGDATABASE}&host=${PGHOSTS}&port=${PGPORT}&username=${PGUSER}&password=${PGPASSWORD}&funcZXY=${FUNC_ZXY}&maxpool=${MAX_HOST_CONNECTIONS}${NO_GZIP}${USE_KEY_COLUMN}${TEST_ON_STARTUP_TILE}"
              echo $PGQUERY
              mkdir -p "$EXPORT_DIR"
              rm -f "$MBTILES_FILE"
              echo "Generating zoom $MIN_ZOOM..$MID_ZOOM from $HOST_COUNT servers, using $MAX_HOST_CONNECTIONS connections per server, $ALL_STREAMS streams"
              tilelive-copy --exit --minzoom=$MIN_ZOOM --maxzoom=$MID_ZOOM --bbox=$BBOX --slow=300000 --retry=2 --concurrency=$ALL_STREAMS --timeout=5400000 \
                  "$PGQUERY" "$MBTILES_FILE"
              echo "Finished generating zoom $MIN_ZOOM..$MID_ZOOM"
              for (( ZOOM=MID_ZOOM+1; ZOOM<=MAX_ZOOM; ZOOM++ )); do
                LIST_FILE="$EXPORT_DIR/tiles_$ZOOM.txt"
                echo "Imputing tiles for zoom $ZOOM"
                mbtiles-tools impute "$MBTILES_FILE" --zoom $ZOOM --output "$LIST_FILE" --verbose
                echo "Generating zoom $ZOOM from $HOST_COUNT servers, using $MAX_HOST_CONNECTIONS connections per server, $ALL_STREAMS streams"
                tilelive-copy --exit --scheme=list "--list=$LIST_FILE" --slow=30000 --retry=2 --concurrency=$ALL_STREAMS --timeout=300000 \
                    "$PGQUERY" "$MBTILES_FILE"
                echo "Finished generating zoom $ZOOM"
              done
              echo "Updating generated tile metadata"
              mbtiles-tools meta-generate "$MBTILES_FILE" "$SRC_DIR/openmaptiles.yaml" --auto-minmax --show-ranges
              echo "Finished tile metadata"
              echo "Tile generation complete!"
        envFrom:
          - configMapRef:
              name: pg-conn
          - configMapRef:
              name: omt-copier-config
          - secretRef:
              name: pg-auth
        volumeMounts:
          - name: export
            mountPath: {{ .Values.exportDir }}
      volumes:
        - name: export
          persistentVolumeClaim:
            claimName: pvc--{{ .Values.diskNamePrefix }}--{{ .Values.diskName }}

I hope this helps. Let me know if you have questions. There's probably a lot of cleanup that could be done since I've been working on this for so long I've gone cross-eyed 😵.

nickpeihl avatar Sep 04 '21 02:09 nickpeihl

Has this ever been finished and officially published?

srudin avatar Oct 20 '22 05:10 srudin

Has this ever been finished and officially published?

No updates from my end. Still a WIP.

nickpeihl avatar Oct 20 '22 14:10 nickpeihl

I was hacking on it for a while, but the reality is that it is far easier to instrument planetiler than to bring up the current openmaptiles infrastructure. I think eventually OMT will adapt planetiler as a tile generation engine, and it will be far easier to move forward

nyurik avatar Oct 20 '22 14:10 nyurik