changeset-map icon indicating copy to clipboard operation
changeset-map copied to clipboard

Features from other changesets

Open geohacker opened this issue 8 years ago • 12 comments

Seeing features from other changesets. This is because of the adiff overpass query, but we should filter before rendering on the map.

cc @ajithranka @maning @batpad

geohacker avatar Mar 01 '17 09:03 geohacker

Example https://osmlab.github.io/changeset-map/#46476343/way/340411236

geohacker avatar Mar 01 '17 09:03 geohacker

We can add the filtering here: https://github.com/osmlab/changeset-map/blob/gh-pages/js/overpass.js#L28-L29. Each geojson feature has changeset property that we can filter against.

ajithranka avatar Mar 01 '17 09:03 ajithranka

I have a problem with OSMcha - It does not show exactly what the web indicates, where it has to show a change, it shows as deletion. https://osmcha.mapbox.com/46634397/ changeset

This is in all US cities

@maning @geohacker

luiswalter avatar Mar 07 '17 17:03 luiswalter

Reopening to continue the conversation.

The fix in #103 filters features by changeset id. The changeset mentioned above (46634397) has two added and two modifed ways. But changeset-map doesn't show the modified features.

screen shot 2017-03-08 at 10 21 33 am
  • http://osmlab.github.io/changeset-map/#46634397
  • https://www.openstreetmap.org/api/0.6/changeset/46634397/download

cc: @geohacker @batpad

ajithranka avatar Mar 08 '17 04:03 ajithranka

Looking at the JSON response for CS 46634397 and comparing it with the XML format, only the old versions of the modified ways 23392012 and 183228991 are included. Don't know if the JSON output has changed recently, but I wonder how changeset-map can actually work with that?

As far as I know, the JSON output format has never officially been supported for adiff queries (see this comment and drolbr/Overpass-API#363).

But it seems you are already looking into it: mapbox/osm-adiff-parser?

nrenner avatar Mar 09 '17 09:03 nrenner

@nrenner You got it. The JSON format is becoming a bit of a problem when trying to identify modified features. We are working on moving to the XML format now and should have it working soon.

ajithranka avatar Mar 09 '17 09:03 ajithranka

But even with filtering and XML adiff format there will still be an issue when the same object is changed in "overlapping" changesets, see also nrenner/achavi#10.

For example, the node 4710487900 created in changeset 46476343 is now missing: https://osmlab.github.io/changeset-map/#46476343

This is because the node was modified in the subsequent changeset 46476422. Only this later version 2 is returned by the adiff query and then filtered out because of the different changeset.

The second changeset is also included because the first got automatically closed after a one hour timeout, the actual edit time ranges don't overlap (see the changeset- and osmChange-XMLs):

46476343: created_at="2017-02-28T19:31:48Z"
           closed_at="2017-02-28T20:32:34Z"  - created_by = OsmAnd+ 2.5.4
        
           timestamp="2017-02-28T19:32:34Z"` - last edit (node 4710488411)

46476422: created_at="2017-02-28T19:34:30Z"
           closed_at="2017-02-28T19:34:30Z"  - created_by = iD 2.1.3 

nrenner avatar Mar 09 '17 17:03 nrenner

@nrenner thank you so much for your inputs here -

The issue with overlapping changesets is indeed a nasty issue. The proposed solution(s) you linked to in the ticket seem potentially interesting, but hard. I think the essential issue here is that we are trying to represent a changeset as some sort of atomic set of changes, whereas in reality that is not the case in OpenStreetMap, where a changeset could be open for upto 24 hours, and others could have made changes to the same features in the intervening period before the open and close hours.

This is definitely an edge case to be aware of, though I am not sure exactly how to solve for. It would be wonderful if we could at least detect these reliably - I think parsing the OSM changeset XML could be interesting, will look into it.

I think in the long run we need a dedicated back-end system to detect atomic changes reliably, and using overpass for this always seems like a small bit of a hack.

My other data concern is the following scenario:

If someone moves a node that is part of a way / larger relation, we only get data for the node change, since that does not increment the version of the higher order way / relation that it also affected. This makes it hard to detect some types of changes / be able to visualize that a harmless looking change to a node could have caused much larger breakage.

In the long run, I would love to collaborate toward building a more reliable back-end that is more specifically tailored to accurately representing changes in a changeset and doesn't require the work-arounds that currently are when querying the overpass adiff (that fulfills an absolutely great function, is just not 100% suitable for this use-case).

For now, I think the adiff query gives us the closest we have to an accurate representation of a changeset, and we really appreciate the work in achavi that allowed us to build off the query pattern, and I really hope we can help each other find solutions to some of the nastier edge cases here. Thank you again for your inputs here, and hope to continue the conversation and improve how we can reliably represent changes in a changeset :-)

batpad avatar Mar 10 '17 06:03 batpad

@batpad: why don't you open an issue with overpass API? The data for your use case for sure is available in the backend, it just doesn't get exposed in a suitable way.

mmd-osm avatar Mar 19 '17 18:03 mmd-osm

A minimal solution would be to at least show a warning when there might be features missing because of an overlap. Indicators are version gaps in changes with other changesets, e.g. creates with version > 1, like in this case. But these could also just be multiple uploads in the other changeset.

Maybe verify or load such missing versions (when not too many) via:

The number of such overlaps could probably be greatly reduced by not including one hour changeset timeouts in the query time range, but how to find out if there was a timeout?

I think the essential issue here is that we are trying to represent a changeset as some sort of atomic set of changes, whereas in reality that is not the case

Fully agree, querying for atomic uploads instead would be a big improvement, not only for this issue. Roland Olbricht has as also briefly mentioned this at a SotM 2016 talk (slide 17), don't know if there are any plans, but would be worth discussing with the Overpass devs. A mid-term solution could be a service that provides a changeset's approximated upload start-/end timestamps by grouping/clustering minutely diff edit timestamps.

My other data concern is the following scenario: If someone moves a node that is part of a way / larger relation, we only get data for the node change, since that does not increment the version of the higher order way / relation that it also affected.

Are you aware that Augmented Diffs/adiff actually do include affected ways/relations and their old and new geometries for node/way changes?

  • Ways are considered as changed when their members have changed or one of its members has changed its coordinate. Relations are considered as changed when their members have changed or one of its way members has changed by its members or coordinates or one of the node members has changed its coordinate.

    https://wiki.openstreetmap.org/wiki/Overpass_API/Augmented_Diffs#Contained_data

And yes, using the adiff query for getting changeset data in the current way is a hack, but is good enough for the majority of cases (but less than I hoped) and still better than the only current alternative of hammering the main API with object version requests. I partly got stuck with achavi development wondering whether it's worth the time searching for workarounds for a hack when there really should be a better back-end service for this.

Especially since visualizing changesets was only an additional feature I couldn't resist to add, while the main intention for achavi was actually to monitor all changes in an area. I also started too many projects, so I would be rather relieved when changeset-map can replace this feature or if I can reuse some common code and I'm happy if I can contribute at least a bit to make this happen.

nrenner avatar Mar 22 '17 11:03 nrenner

Similar to the overlapping changesets issue, there is also an issue with reading changesets from minutely augmented diffs when the same object is changed in two or more changesets within the same minute.

Example:

changeset_id num_changes_extract previous_changeset_ids object_ids last_timestamp user 54030466 1 54030462 205439375v3/v2 t2017-11-23T16:46:25Z ualxsob

  • CS 54030466 way 205439375 v3 2017-11-23T16:46:25Z - compares with v1 instead of v2
  • CS 54030462 way 205439375 v2 2017-11-23T16:46:20Z - error because changeset JSON does not exist (in other cases error because object is missing)
  • changeset id="54030466" created_at="2017-11-23T16:46:24Z" closed_at="2017-11-23T17:07:14Z"
    changeset id="54030462" created_at="2017-11-23T16:46:19Z" closed_at="2017-11-23T17:07:14Z"
    54030466 XML, 54030462 XML

It seems StreetComplete opens parallel changesets, one for each quest. This works because it does live editing where each object is uploaded immediately.

There are 45 cases with ways in November in the extract. Some more selected examples:

StreetComplete
53423684 2 53423677 130625913v6/v5,171575209v8/v7 t2017-11-01T13:13:19Z ulooniverse
54117181 4 54117169,54117176 19603904v17/v16,19605400v9/v8,19605401v9/v8 t2017-11-27T09:51:50Z uMary%20%Sal
54168906 1 54168901 180171645v5/v4 t2017-11-29T07:18:52Z umonachus_de

iD
53404164 1 53404158 25648104v32/v31 t2017-10-31T18:44:19Z ukanoe321
53404435 1 53404428 24769041v42/v41 t2017-10-31T18:51:48Z uadele45
53914421 2 53914415 25628424v22/v21 t2017-11-19T01:50:34Z uGerold

JOSM
53242236 2 53242204 28633009v11/v10,223636972v16/v15 t2017-10-25T17:53:51Z uOF-1

Examples found with full history extract obtained from Geofabrik downloads and the Osmium Tool OPL File Format:

wget http://download.geofabrik.de/europe/germany/baden-wuerttemberg/tuebingen-regbez.osh.pbf
osmium cat tuebingen-regbez.osh.pbf -o tuebingen-regbez.osh.opl
awk -F" " '{c=substr($4,2)} /^w.* t2017-11/ {o=substr($1,2);t=substr($5,1,17); if (po==o && pc!=c && pt==t) {if (index(cc[c],pc) == 0) cc[c]=cc[c]","pc; co[c]=co[c]",["o$2"](http://osmlab.github.io/changeset-map/#"c"/way/"o")/["p2"](http://osmlab.github.io/changeset-map/#"pc"/way/"po")"; cd[c]=$5" "$7}; p0=$0; po=o; p2=$2; pc=c; pt=t} {cn[c]++} END{for(i in co) print "["i"](https://osm.org/changeset/"i")",cn[i],substr(cc[i],2),substr(co[i],2),cd[i]"  " }' tuebingen-regbez.osh.opl | \
  sort -n

Collects changesets and objects where previous line has same object but different changeset within the same minute; assumes ordered by type, id, version.

nrenner avatar Dec 13 '17 13:12 nrenner

@nrenner was great to meet you at SOTM, Heidelberg and pick up some of these issues. Commenting on this ticket mostly to remind myself of its existence and look at some of your examples more closely at some point.

Since you had asked, these are the repositories that handle the conversion from the overpass adiff XML to GeoJSON representations:

  • Create the JSONs, grouping by changeset ID: https://github.com/mapbox/osm-adiff-parser
  • Convert to "real changeset" GeoJSON: https://github.com/mapbox/real-changesets-parser

These are definitely super interesting edge cases, and even if they can't always be solved for, it would be great to have a way to recognize them / warn users.

Thank you for the great conversations.

batpad avatar Sep 25 '19 11:09 batpad