oshdb
oshdb copied to clipboard
Literature suggests error in Contributions pre 2011
This article (Neis and Zipf (2012) Analyzing the Contributor Activity of a Volunteered Geographic Information Project — The Case of OpenStreetMap, https://www.mdpi.com/2220-9964/1/2/146) suggests that users might have increased the version of an element without touching the element (p155 bottom)¹. The text is rather unclear on what editors are affected and what the error exactly does.
- [ ] It should therefore be investigated, what this error does and if the users of oshdb can be notified/prevented from this error.
¹ "[...] It is important to note that for this particular method to determine the activity area polygon of a member, only Nodes that a member created were included, no edited Nodes or deleted Nodes were considered. Initial calculations that included all Nodes showed some irregularities, which were based on a software error in the OSM editors in the past (before 2011). This error increased the version number of a Node although the object was not changed in any way by any user directly, but because the Node would fall into the range of a certain changeset. Thus, the database would count a change to a Node, although the member did not actually edit the data. It is important to consider these errors when conducting similar studies to [32,40,43,44], in which the versions of an OSM object should be based on real changes and not primarily on the number of editors and the absolute version number. [...]"
//cc @pa5cal: can you perhaps clarify what kind of software error is meant in this paragraph?
I think this is the most important sentence: "This error increased the version number of a Node although the object was not changed in any way by any user directly". So the OSM element has a new (higher) version number, but the contributor didn't change one of its tags or the element's geometry. If I remember correctly, this only happened with the Potlatch editor and for Nodes which fall into the boundingbox of the contributor's changeset. You should then better filter out such modifications, when you utilize the element's version number and its different contributors. Hope that it helps you.
I did a quick analysis of this today. Below are the results, but note that I haven't done any double-checking on the numbers yet, so consume them with a bit of caution for now. It seems like this was indeed a big issue back in the day (2008-2009 mostly). Interestingly, these kind of editor issues seem to have a comeback in the past few years. But admittedly only very slowly increasing and much to a much lesser extent. Would certainly be interesting to investigate this further, though.
seems to be strongly related: https://gitlab.gistools.geog.uni-heidelberg.de/giscience/big-data/ohsome/oshdb/issues/127
proposed solution:
Introduce ContributionType.VersionNoOnly
with a well defined specification:
"This ContributionType is present if there is no change to the object but the version-no. In general this type should be excluded from analyses because it might be related to bugs in the editor-software of OSM (see https://github.com/GIScience/oshdb/issues/87). It is primarily meant for bugtracking."
solution is also proposed in #113
this is interesting for an analysis such as bachelor or master thesis, but not in scope of OSHDB.