osm2pgsql
osm2pgsql copied to clipboard
Non moving node updates should not update related way/relation geometries
When a node is modified, osm2pgsql does not make a difference if it is just a tag change or a lat/lon change.
When the lat/lon is the same as the actual one in the database, there is no impact on ways and relations refering to that node and their geometry do not need to be updated.
I suggest to add a test in node_modify to check if the lat/lon has changed or not.
The actual code is very basic:
- delete the node from the DB
- set the node location in the cache
- create the node in the DB
- look for related ways/relations and mark them as pending for update as well
It could be more efficient:
- update the node in the DB (UPDATE instead of DELETE/INSERT may also limit DB bloat)
- if location has changed:
- set the new location in the cache
- mark related ways/relations as pending for update
I've implemented the test for location change, and it shows a 10x speedup in the node parsing step for nodes that did not move.
In the diffs I've used for my tests, a large portion of nodes had new coordinates, so the speedup is reduced but still there for the others (from 10 to 50% in my tests).
The benefit is not limited to the node parsing... but cascades to ways and relations as fewer pending ones need to be checked and updated.
Do you like to do a quick PR? I can't promise that we take it as is but it would be easier to discuss the implications with the actual code.
I'll do. It's only a few lines change.
Be aware... I've not written much C++ in my life, it may hurt purists ;)
Just pushed the PR: https://github.com/openstreetmap/osm2pgsql/pull/1059
@cquest Can you elaborate on what you tested exactly and how the numbers changed with and without this patch? There are several issues here that could affect timing: With the patch fewer ways (and relations) will be marked as (possibly) changed. On the other hand we have to lookup the old location of all nodes, which is cheap if we are using flat node store, but possibly expensive when using the database. All of this, of course, depends on what kind of node changes there are usually.
To figure out the best way to do this we need some numbers not from isolated queries, but from real-world update runs. Does this, on avarage, speed up things considerably or not? How does the flat node factor in here? How does the bucket index? @cquest You speek of a 10% speed up which would be well worth it, but I am not sure what you have been measuring there.