maps-docker-compose
maps-docker-compose copied to clipboard
Planet Mapping
@chesty,
Sorry to bug you again, I know you don't have much interest in the project anymore but I wondered if you could give me some insights. Geofabrik doesn't have a planet.osm.pbf file so I'm using OSM's pbf but just commented out the update url in the osm.env. I'm looking for a way to run the update but am unsure how the container works to update, or what files and filetypes it's expecting.
For reference, I'm pulling the planet from here: https://ftpmirror.your.org/pub/openstreetmap/pbf/
i googled and I think https://planet.openstreetmap.org/replication/day/ might be the right url.
I'm happy to answer questions but you'll have to be patient. I remember fixing a bug with the update section many months ago but it's not very well tested so you're the tester.
@chesty
Awesome, once I'm done with initdb I'll make the change and see if it works and I'll report back. Thanks again
I don't think I'm working on the weekend so post any questions you have here. I'm not an expert on osm though. And I've forgotten a lot since I stopped using it.
On Fri, 9 Jul 2021, 04:18 tooneamelter, @.***> wrote:
@chesty https://github.com/chesty
Awesome, once I'm done with initdb I'll make the change and see if it works and I'll report back. Thanks again
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/chesty/maps-docker-compose/issues/6#issuecomment-876646882, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABPSHOAFEF74MR3LZYYSY3TWXTWVANCNFSM5ABGANFQ .
Awesome, thank you. I'm kind of surprised it's still churning. It's been almost 3 days and initdb isn't done yet. I tuned postgres with pgtune and it is using 252.9GiB of the 503.8GiB of the available memory but hardly any of the CPU, so PG must be memory intensive during this period with hardly any CPU usage.
docker stats shows CPU % at 35-40% but top, htop, etc shows the CPU almost completely idle. Utilization is 1.4%, which is weird.
Either way, it's still cruising so when it's done, I'll report back.
hmmm. maybe IO related? you could use vmstat or iostat. I guess pgtune asked you if you had ssd's? because that influences how postgres should be configured.
Is this the command that gets run
gosu osm osm2pgsql -G -U "$POSTGRES_USER" -d "$POSTGRES_DB" -H "$POSTGRES_HOST" --slim -C "$OSM2PGSQLCACHE" \ --style /usr/local/share/openstreetmap-carto/openstreetmap-carto.style \ --tag-transform-script /usr/local/share/openstreetmap-carto/openstreetmap-carto.lua \ --hstore --hstore-add-index \ --number-processes $NPROCS \ /data/"$OSM_PBF"
If you run ps auxwww | grep osm2pgsql
check you see --number-processes 64 (or whatever, the number of CPUs you have made available)
Also $OSM2PGSQLCACHE would need to be tuned. you'd have to google for the optimum setting for your system. I have no idea what it should be. Basically go through osm.env and osm-config.sh and check every variable is right for you system using google or reading the docs to find out. getting to know renderd-docker-entrypoint.sh would be a good idea too
also google tune postgres large import. you probably want to delete all the indexes for the import and then add them after it finished. there might be other things you could do, too. like setting some checkpoint size. I can't remember what it's called. it does sound like osm2pgsql should be using 64 cpus but is only using 1. so I would read the docs, faq, wiki of osm2pgsql
if you look at osm-config.sh, it does
: ${NPROCS:=$(grep -c ^processor /proc/cpuinfo 2>/dev/null || 1)}
basically if NPROCS isn't already set in osm.env, it tries to count the number of cpus found in /proc/cpuinfo, and if that doesn't work it sets it to 1 cpu.it sounds like for some reason it's set to 1. I would exec bash in the render container and less /proc/cpuinfo
to check the container has lots of CPUs, also double check the osm2pgsql --numer-processes. If it's 1, you might be better off setting NPROCS=50
or a few less than the max in osm.env
hmmm. maybe IO related? you could use vmstat or iostat. I guess pgtune asked you if you had ssd's? because that influences how postgres should be configured.
Is this the command that gets run
gosu osm osm2pgsql -G -U "$POSTGRES_USER" -d "$POSTGRES_DB" -H "$POSTGRES_HOST" --slim -C "$OSM2PGSQLCACHE" \ --style /usr/local/share/openstreetmap-carto/openstreetmap-carto.style \ --tag-transform-script /usr/local/share/openstreetmap-carto/openstreetmap-carto.lua \ --hstore --hstore-add-index \ --number-processes $NPROCS \ /data/"$OSM_PBF"
If you run
ps auxwww | grep osm2pgsql
check you see --number-processes 64 (or whatever, the number of CPUs you have made available) Also $OSM2PGSQLCACHE would need to be tuned. you'd have to google for the optimum setting for your system. I have no idea what it should be. Basically go through osm.env and osm-config.sh and check every variable is right for you system using google or reading the docs to find out. getting to know renderd-docker-entrypoint.sh would be a good idea too
gis 124176 34.9 1.0 7534216 5792112 ? Sl Jul07 1240:55 osm2pgsql -G -U postgres -d gis -H postgres --slim -C 4000 --style /usr/local/share/openstreetmap-carto/openstreetmap-carto.style --tag-transform-script /usr/local/share/openstreetmap-carto/openstreetmap-carto.lua --hstore --hstore-add-index --number-processes 56 /data/planet-latest.osm.pbf
It says 56, which is how many cores I have
also google tune postgres large import. you probably want to delete all the indexes for the import and then add them after it finished. there might be other things you could do, too. like setting some checkpoint size. I can't remember what it's called. it does sound like osm2pgsql should be using 64 cpus but is only using 1. so I would read the docs, faq, wiki of osm2pgsql
if you look at osm-config.sh, it does
: ${NPROCS:=$(grep -c ^processor /proc/cpuinfo 2>/dev/null || 1)}
basically if NPROCS isn't already set in osm.env, it tries to count the number of cpus found in /proc/cpuinfo, and if that doesn't work it sets it to 1 cpu.it sounds like for some reason it's set to 1. I would exec bash in the render container andless /proc/cpuinfo
to check the container has lots of CPUs, also double check the osm2pgsql --numer-processes. If it's 1, you might be better off settingNPROCS=50
or a few less than the max in osm.env
Both the initdb and postgres containers have all 56 cpu's listed.
My pgtune showed this:
# WARNING
# this tool not being optimal
# for very high memory systems
# DB Version: 13
# OS Type: linux
# DB Type: web
# Total Memory (RAM): 512 GB
# CPUs num: 56
# Data Storage: ssd
max_connections = 200
shared_buffers = 128GB
effective_cache_size = 384GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 167772kB
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 56
max_parallel_workers_per_gather = 4
max_parallel_workers = 56
max_parallel_maintenance_workers = 4
checkpoint_completion_target = 0.9
this is part of the checkpoint thing I was trying to remember, there are other checkpoint settings, I think time and size. from memory 0.9 is 90%, i think it means (look it up to be sure) new transactions first get stored in some temp storage called checkpoint something, that has a set size you can adjust, there's also a checkpoint interval timer, 90% of the transactions should be committed to the main tables before the next checkpoint interval. That's an expansive operation, so the more frequent the checkpoints, the slower the import process will be. You can space out the checkpoints by increasing the size and time between checkpoints. the trade off is if there's a crash it will take postgres longer to start as it has to replay lots of transactions that are in the checkpoint database still.
I don't know why it's only using 1 cpu. maybe it's working as good as it can be or maybe you could improve it. google is your friend.
' -C 4000' this could probably be a lot bigger, you'd have to google.
checkpoint_completion_target = 0.9
this is part of the checkpoint thing I was trying to remember, there are other checkpoint settings, I think time and size. from memory 0.9 is 90%, i think it means (look it up to be sure) new transactions first get stored in some temp storage called checkpoint something, that has a set size you can adjust, there's also a checkpoint interval timer, 90% of the transactions should be committed to the main tables before the next checkpoint interval. That's an expansive operation, so the more frequent the checkpoints, the slower the import process will be. You can space out the checkpoints by increasing the size and time between checkpoints. the trade off is if there's a crash it will take postgres longer to start as it has to replay lots of transactions that are in the checkpoint database still.
I don't know why it's only using 1 cpu. maybe it's working as good as it can be or maybe you could improve it. google is your friend.
Yeah, I'm looking into checkpoint_completion_target and other checkpoint options, thank you for the tip.
I have been reading benchmarks from osm2psql imports at a much faster pace (half a day or so for the whole planet) than what I have going on so I'm going to probably install it without containerizing it and then rework the docker project in the next month or so to see what could be slowing it down. I'd usually spend some time on it but I have some urgency being pushed on me by another department.