osm2pgsql icon indicating copy to clipboard operation
osm2pgsql copied to clipboard

Non moving node updates should not update related way/relation geometries

Open cquest opened this issue 4 years ago • 5 comments

When a node is modified, osm2pgsql does not make a difference if it is just a tag change or a lat/lon change.

When the lat/lon is the same as the actual one in the database, there is no impact on ways and relations refering to that node and their geometry do not need to be updated.

I suggest to add a test in node_modify to check if the lat/lon has changed or not.

The actual code is very basic:

  • delete the node from the DB
  • set the node location in the cache
  • create the node in the DB
  • look for related ways/relations and mark them as pending for update as well

It could be more efficient:

  • update the node in the DB (UPDATE instead of DELETE/INSERT may also limit DB bloat)
  • if location has changed:
    • set the new location in the cache
    • mark related ways/relations as pending for update

cquest avatar Jan 03 '20 11:01 cquest

I've implemented the test for location change, and it shows a 10x speedup in the node parsing step for nodes that did not move.

In the diffs I've used for my tests, a large portion of nodes had new coordinates, so the speedup is reduced but still there for the others (from 10 to 50% in my tests).

The benefit is not limited to the node parsing... but cascades to ways and relations as fewer pending ones need to be checked and updated.

cquest avatar Jan 04 '20 16:01 cquest

Do you like to do a quick PR? I can't promise that we take it as is but it would be easier to discuss the implications with the actual code.

lonvia avatar Jan 08 '20 10:01 lonvia

I'll do. It's only a few lines change.

Be aware... I've not written much C++ in my life, it may hurt purists ;)

cquest avatar Jan 09 '20 15:01 cquest

Just pushed the PR: https://github.com/openstreetmap/osm2pgsql/pull/1059

cquest avatar Jan 11 '20 15:01 cquest

@cquest Can you elaborate on what you tested exactly and how the numbers changed with and without this patch? There are several issues here that could affect timing: With the patch fewer ways (and relations) will be marked as (possibly) changed. On the other hand we have to lookup the old location of all nodes, which is cheap if we are using flat node store, but possibly expensive when using the database. All of this, of course, depends on what kind of node changes there are usually.

To figure out the best way to do this we need some numbers not from isolated queries, but from real-world update runs. Does this, on avarage, speed up things considerably or not? How does the flat node factor in here? How does the bucket index? @cquest You speek of a 10% speed up which would be well worth it, but I am not sure what you have been measuring there.

joto avatar Oct 20 '20 07:10 joto