pgosm-flex
pgosm-flex copied to clipboard
PgOSM Flex provides high quality OpenStreetMap datasets in PostGIS (Postgres) using the osm2pgsql Flex output.
PgOSM Flex
PgOSM Flex provides high quality OpenStreetMap datasets in PostGIS using the osm2pgsql Flex output. This project provides a curated set of Lua and SQL scripts to clean and organize the most commonly used OpenStreetMap data, such as roads, buildings, and points of interest (POIs).
The easiest way to use PgOSM Flex is via the Docker image. For ultimate control and customization, there are instructions for installing and running manually.
Project decisions
A few decisions made in this project:
- ID column is
osm_id - Geometry stored in SRID 3857 (customizable)
- Geometry column named
geom - Default to same units as OpenStreetMap (e.g. km/hr, meters)
- Data not deemed worthy of a dedicated column goes in side table
osm.tags. Raw key/value data stored inJSONBcolumn - Points, Lines, and Polygons are not mixed in a single table
This project's approach is to do as much processing in the Lua styles passed along to osm2pgsql, with post-processing steps creating indexes, constraints and comments.
Versions Supported
Minimum versions supported:
- Postgres 12
- PostGIS 3.0
- osm2pgsql 1.6.0
Minimum Hardware
osm2pgsql requires at least 2 GB RAM. Fast SSD drives are strongly recommended.
PgOSM via Docker
The easiest way to use PgOSM-Flex is with the
Docker image hosted on
Docker Hub.
The image has all the pre-requisite software installed,
handles downloading an OSM region (or subregion)
from Geofabrik, and saves an output .sql file with the processed data
ready to load into your database(s).
The PBF/MD5 source files are archived by date with the ability to
easily reload them at a later date.
Basic Docker usage
This section outlines the basic operations for using Docker to run PgOSM-Flex. See the full Docker instructions in docs/DOCKER-RUN.md.
Create directory for the .osm.pbf file, output .sql file, log output, and
the osm2pgsql command ran.
mkdir ~/pgosm-data
Set environment variables for the temporary Postgres connection in Docker. These are required for the Docker container to run.
export POSTGRES_USER=postgres
export POSTGRES_PASSWORD=mysecretpassword
Start the pgosm Docker container. At this point, PostgreSQL / PostGIS
is available on port 5433.
docker run --name pgosm -d --rm \
-v ~/pgosm-data:/app/output \
-v /etc/localtime:/etc/localtime:ro \
-e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
-p 5433:5432 -d rustprooflabs/pgosm-flex
Use docker exec to run the processing for the Washington D.C subregion.
This example uses three (3) parameters to specify the totaol system RAM (8 GB)
along with a region/subregion.
- Total RAM for osm2pgsql, Postgres and OS (
8) - Region (
north-america/us) - Sub-region (
district-of-columbia) (Optional)
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=district-of-columbia
The above command takes roughly 1 minute to run if the PBF for today has already been downloaded. If the PBF is not downloaded it will depend on how long it takes to download the 17 MB PBF file + ~ 1 minute processing.
After processing
The ~/pgosm-data directory has three (3) files from a single run.
The PBF file and its MD5 checksum have been renamed with the date in the filename.
This enables loading the file downloaded today
again in the future, either with the same version of PgOSM Flex or the latest version. The docker exec command uses the PGOSM_DATE environment variable
to load these historic files.
The output .sql is also saved in the ~/pgosm-data directory.
ls -alh ~/pgosm-data/
-rw-r--r-- 1 root root 17M Nov 2 19:57 district-of-columbia-2021-11-03.osm.pbf
-rw-r--r-- 1 root root 70 Nov 2 19:59 district-of-columbia-2021-11-03.osm.pbf.md5
-rw-r--r-- 1 root root 156M Nov 3 19:10 pgosm-flex-north-america-us-district-of-columbia-default-2021-11-03.sql
This .sql file can be loaded into a PostGIS enabled database. The following example
creates an empty myosm database to load the processed OpenStreetMap data into.
psql -d postgres -c "CREATE DATABASE myosm;"
psql -d myosm -c "CREATE EXTENSION postgis;"
psql -d myosm \
-f ~/pgosm-data/pgosm-flex-north-america-us-district-of-columbia-default-2021-11-03.sql
The processed OpenStreetMap data is also available in the Docker container on port 5433.
You can connect and query directly in the Docker container.
psql -h localhost -p 5433 -d pgosm -U postgres -c "SELECT COUNT(*) FROM osm.road_line;"
┌───────┐
│ count │
╞═══════╡
│ 39865 │
└───────┘
See more in docs/DOCKER-RUN.md about other ways to customize how PgOSM Flex runs.
On-server import
Don't want to use the Docker process? See docs/MANUAL-STEPS-RUN.md for prereqs and steps for running without Docker.
Layer Sets
PgOSM Flex includes a few layersets and makes it easy to customize your own. See docs/LAYERSETS.md for details.
QGIS Layer Styles
Use QGIS to visualize OpenStreetMap data? This project includes a few basic
styles using the public.layer_styles table created by QGIS.
See the QGIS Style README.md for more information.
Loaded by Docker process by default. Is excluded when --data-only used.
Explore data loaded
A peek at some of the tables loaded.
This query requires the
PostgreSQL Data Dictionary (PgDD) extension,
use \dt+ osm.* in psql for similar details.
SELECT s_name, t_name, rows, size_plus_indexes
FROM dd.tables
WHERE s_name = 'osm'
ORDER BY t_name LIMIT 10;
┌────────┬──────────────────────┬────────┬───────────────────┐
│ s_name │ t_name │ rows │ size_plus_indexes │
╞════════╪══════════════════════╪════════╪═══════════════════╡
│ osm │ amenity_line │ 7 │ 56 kB │
│ osm │ amenity_point │ 5796 │ 1136 kB │
│ osm │ amenity_polygon │ 7593 │ 3704 kB │
│ osm │ building_point │ 525 │ 128 kB │
│ osm │ building_polygon │ 161256 │ 55 MB │
│ osm │ indoor_line │ 1 │ 40 kB │
│ osm │ indoor_point │ 5 │ 40 kB │
│ osm │ indoor_polygon │ 288 │ 136 kB │
│ osm │ infrastructure_point │ 884 │ 216 kB │
│ osm │ landuse_point │ 18 │ 56 kB │
└────────┴──────────────────────┴────────┴───────────────────┘
Meta table
PgOSM-Flex tracks basic metadata in table osm.pgosm_flex.
The ts is set by the post-processing script. It does not necessarily
indicate the date of the data loaded, though in general it should be close
depending on your processing pipeline.
SELECT *
FROM osm.pgosm_flex;
┌────────────┬──────────────┬───────────────┬────────────────────┬──────┬───────────────────────────────────┬───────────────────┐
│ osm_date │ default_date │ region │ pgosm_flex_version │ srid │ project_url │ osm2pgsql_version │
╞════════════╪══════════════╪═══════════════╪════════════════════╪══════╪═══════════════════════════════════╪═══════════════════╡
│ 2020-01-01 │ t │ north-america │ 0.1.1-f488d7b │ 3857 │ https://github.com/rustprooflabs/…│ 1.4.1 │
│ │ │ │ │ │…pgosm-flex │ │
└────────────┴──────────────┴───────────────┴────────────────────┴──────┴───────────────────────────────────┴───────────────────┘
Query examples
For example queries with data loaded by PgOSM-Flex see docs/QUERY.md.
Points of Interest (POIs)
Loads an range of tags into a materialized view (osm.poi_all) for easy searching POIs.
Line and polygon data is forced to point geometry using
ST_Centroid(). This layer duplicates a bunch of other more specific layers
(shop, amenity, etc.) to provide a single place for simplified POI searches.
Special layer included by layer sets run-all and run-no-tags.
See style/poi.lua for logic on how to include POIs.
The topic of POIs is subject and likely is not inclusive of everything that probably should be considered
a POI. If there are POIs missing
from this table please submit a new issue
with sufficient details about what is missing.
Pull requests also welcome! See CONTRIBUTING.md.
Counts of POIs by osm_type.
SELECT osm_type, COUNT(*)
FROM osm.vpoi_all
GROUP BY osm_type
ORDER BY COUNT(*) DESC;
Results from Washington D.C. subregion (March 2020).
┌──────────┬───────┐
│ osm_type │ count │
╞══════════╪═══════╡
│ amenity │ 12663 │
│ leisure │ 2701 │
│ building │ 2045 │
│ shop │ 1739 │
│ tourism │ 729 │
│ man_made │ 570 │
│ landuse │ 32 │
│ natural │ 19 │
└──────────┴───────┘
Includes Points (N), Lines (L) and Polygons (W).
SELECT geom_type, COUNT(*)
FROM osm.vpoi_all
GROUP BY geom_type
ORDER BY COUNT(*) DESC;
┌───────────┬───────┐
│ geom_type │ count │
╞═══════════╪═══════╡
│ W │ 10740 │
│ N │ 9556 │
│ L │ 202 │
└───────────┴───────┘
One table to rule them all
From the perspective of database design, the osm.unitable option is the worst!
This violates all sorts of best practices established in this project
by shoving all features into a single unstructured table.
This style included in PgOSM-Flex is intended to be used for troubleshooting and quality control. It is not intended to be used for real production workloads! This table is helpful for exploring the full data set when you don't really know what you are looking for, but you know where you are looking.
Unitable is loaded with the everything layerset. Feel free to create your own
customized layerset if needed.
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=district-of-columbia \
--layerset=everything
The
unitable.luascript include in in this project was adapted from the unitable example from osm2pgsql. This version uses JSONB instead of HSTORE and takes advantage ofhelpers.luato easily customize SRID.
JSONB support
PgOSM-Flex uses JSONB in Postgres to store the raw OpenSteetMap
key/value data (tags column)
and relation members (member_ids).
Current JSONB columns:
osm.tags.tagsosm.unitable.tagsosm.place_polygon.member_idsosm.vplace_polygon.member_idsosm.poi_polygon.member_ids
Projects using PgOSM Flex
See the listing of known projects using PgOSM Flex.
Additional resources
Blog posts covering various details and background information.