osm2pgsql icon indicating copy to clipboard operation
osm2pgsql copied to clipboard

Ability to have pre-processing script for osm2pgsql-replication

Open stalker314314 opened this issue 2 years ago • 2 comments

What version of osm2pgsql are you using?

osm2pgsql version 1.6.0
Build: None
Compiled using the following library versions:
Libosmium 2.18.0
Proj [API 6] 9.0.1
Lua 5.3.6

What operating system and PostgreSQL/PostGIS version are you using?

Debian testing

What did you do exactly?

Scenario: I want to be able to trim .osc file inside osm2pgsql-replication after it is collected from replication server, but before it is send to osm2pgsql for processing. One use case is calling trim_osc, another could be doing diff of .osc and postgis (as this is only time to do proper diff with current data, before .osc data enters database).

As far as I know, there is no way to stop processing between these two events (getting .osc from repl server and sending to osm2pgsql).

I was thinking it could follow same pattern as post-processing logic. Another --pre-processing switch and even same arguments (seq and timestamp). As discussed at #1719 , if one needs to get path to .osc, one can use provide --diff-file argument to osm2pgsql-replication script.

Only question (besides do you want to support this) is do we want to .osc file to be edited in-place, or we want some more sophisticated algorithm (for example - pre-processing script do not edit file in-place, but save it somewhere differently and return output path in stdout, or just stream new .osc to stdout, and we capture it...). IMHO, easiest would be in-place editing, but I am open for suggestion.

I am also volunteering to implement this logic (if we agree it can be useful).

stalker314314 avatar Aug 01 '22 19:08 stalker314314

Does https://switch2osm.org/serving-tiles/updating-as-people-edit-pyosmium/ help? That was what I ended up doing following a suggestion by lonvia when I asked pretty much the same question. With osm2pgsql-replication, you'll often not need to call "trim" as you can likely get a feed of the same area you loaded.

SomeoneElseOSM avatar Aug 01 '22 19:08 SomeoneElseOSM

Yes, it can do the job, but my issue focuses more about using osm2pgsql-replication specifically. I think it is great piece of software and I would like to see one-stop solution for these use cases. And switch2osm could be simplified if this is implemented, I think:) As it is written on yours link:

A simpler, but less flexible, method to update a database is to use “osm2pgsql-replication”,

(emphasis mine)

I think it can be both simple and flexible, and this issue is about that.

stalker314314 avatar Aug 02 '22 20:08 stalker314314

This is outside the scope of the osm2pgsql-replication script. You should look into the underlying replication library of pyosmium and build your own custom python scripts.

lonvia avatar Nov 10 '22 09:11 lonvia