oshdb icon indicating copy to clipboard operation
oshdb copied to clipboard

Literature suggests error in Contributions pre 2011

Open SlowMo24 opened this issue 5 years ago • 6 comments

This article (Neis and Zipf (2012) Analyzing the Contributor Activity of a Volunteered Geographic Information Project — The Case of OpenStreetMap, https://www.mdpi.com/2220-9964/1/2/146) suggests that users might have increased the version of an element without touching the element (p155 bottom)¹. The text is rather unclear on what editors are affected and what the error exactly does.

  • [ ] It should therefore be investigated, what this error does and if the users of oshdb can be notified/prevented from this error.

¹ "[...] It is important to note that for this particular method to determine the activity area polygon of a member, only Nodes that a member created were included, no edited Nodes or deleted Nodes were considered. Initial calculations that included all Nodes showed some irregularities, which were based on a software error in the OSM editors in the past (before 2011). This error increased the version number of a Node although the object was not changed in any way by any user directly, but because the Node would fall into the range of a certain changeset. Thus, the database would count a change to a Node, although the member did not actually edit the data. It is important to consider these errors when conducting similar studies to [32,40,43,44], in which the versions of an OSM object should be based on real changes and not primarily on the number of editors and the absolute version number. [...]"

SlowMo24 avatar Mar 05 '19 12:03 SlowMo24

//cc @pa5cal: can you perhaps clarify what kind of software error is meant in this paragraph?

tyrasd avatar Mar 06 '19 08:03 tyrasd

I think this is the most important sentence: "This error increased the version number of a Node although the object was not changed in any way by any user directly". So the OSM element has a new (higher) version number, but the contributor didn't change one of its tags or the element's geometry. If I remember correctly, this only happened with the Potlatch editor and for Nodes which fall into the boundingbox of the contributor's changeset. You should then better filter out such modifications, when you utilize the element's version number and its different contributors. Hope that it helps you.

pa5cal avatar Mar 09 '19 11:03 pa5cal

I did a quick analysis of this today. Below are the results, but note that I haven't done any double-checking on the numbers yet, so consume them with a bit of caution for now. It seems like this was indeed a big issue back in the day (2008-2009 mostly). Interestingly, these kind of editor issues seem to have a comeback in the past few years. But admittedly only very slowly increasing and much to a much lesser extent. Would certainly be interesting to investigate this further, though.

node-non-edits

tyrasd avatar Mar 13 '19 10:03 tyrasd

seems to be strongly related: https://gitlab.gistools.geog.uni-heidelberg.de/giscience/big-data/ohsome/oshdb/issues/127

SlowMo24 avatar Mar 13 '19 11:03 SlowMo24

proposed solution: Introduce ContributionType.VersionNoOnly with a well defined specification:

"This ContributionType is present if there is no change to the object but the version-no. In general this type should be excluded from analyses because it might be related to bugs in the editor-software of OSM (see https://github.com/GIScience/oshdb/issues/87). It is primarily meant for bugtracking."

SlowMo24 avatar Mar 13 '19 12:03 SlowMo24

solution is also proposed in #113

SlowMo24 avatar Apr 03 '19 14:04 SlowMo24

this is interesting for an analysis such as bachelor or master thesis, but not in scope of OSHDB.

Hagellach37 avatar Oct 20 '22 12:10 Hagellach37