osmcha-frontend icon indicating copy to clipboard operation
osmcha-frontend copied to clipboard

Missing changes in long open/multi-upload, large bbox changesets

Open nrenner opened this issue 2 years ago • 2 comments

I'm submitting a bug report

Brief Description

Not all changes are shown for changesets with a large bounding box and a long open time span with multiple uploads, e.g. 133929698 (OSM).

What is the current behaviour ?

The total changes count for a changeset does not match with the number of changes shown on the map and in the changes tabs ("tag changes" only in this case):

screenshot

Out of 100 changesets in a sample I took from user wheelmap_visitor:

  • 37 have missing changes
  • 4 are empty (#650)
  • 4 with a bbox > 1 deg² are complete

What is the expected behaviour ?

As in #650, I would assume this has worked in the past, but don't really know in this case.

When does this occur ?

Might be the same root cause as in #650. Cases in the sample start with a bbox size larger than about 6 "square degrees" (simple width * height from bbox coordinates).

An additional condition for missing changes seems to be that a changeset is kept open for a long time, > 4500 seconds (75 minutes) in this sample, and/or that there are multiple uploads. The Wheelmap editor rosemary apparently immediately uploads each object change and a changeset stays open until it times out after an hour.

How do we replicate the issue ?

  1. Click on the changeset id link in the examples below and compare total changes shown in the changeset list or tab with the number of changes in the "tag changes" tab (wheelmap_visitor only adds/modifies wheelmap tags).
    The changes numbers in the table are taken from the S3 JSON metadata.changes_count and actual changes by counting elements with the same changeset id.

Missing examples:

changeset changes
actual

expected
open for
seconds
bbox size
deg²
editor
133586098 OSM json 2 6 6253 6 rosemary v0.4.4
133875017 OSM json 1 2 7068 8 rosemary v0.4.4
133775713 OSM json 7 12 12099 33 rosemary v0.4.4
134031519 OSM json 3 35 8535 154 rosemary v0.4.4
133803143 OSM json 1 5 4493 1485 rosemary v0.4.4
134027190 OSM json 1 2 5442 2877 rosemary v0.4.4
134051281 OSM json 4 369 37203 3374 rosemary v0.4.4
133545073 OSM json 35 88 27829 4470 rosemary v0.4.4

Complete examples:

changeset changes
actual

expected
open for
seconds
bbox size
deg²
editor
133657644 OSM json 3 3 4956 6 rosemary v0.4.4
133763487 OSM json 13 13 10199 6 rosemary v0.4.4
133567836 OSM json 2 2 4167 7 rosemary v0.4.4

Other Information / context:

Related:

  • #650

nrenner avatar Mar 30 '23 14:03 nrenner

Thanks for the detailed report, @nrenner !

Not all changes are shown for changesets with a large bounding box and a long open time span with multiple uploads, e.g. 133929698 (OSM).

Probably in that case, as the bbox and the time the changeset stayed open are so big, the overpass query timed out and the S3 file could not be updated when the changeset was closed.

I can see that https://s3.amazonaws.com/mapbox/real-changesets/production/133929698.json has open: true yet and the latest feature change timestamp is 2023-03-21T08:47:09Z.

I checked and the overpass query we execute is the same as achavi does, but our overpass server doesn't respond to such a large query.

willemarcel avatar Mar 30 '23 15:03 willemarcel

@willemarcel thanks for the quick response and checking!

I can see that https://s3.amazonaws.com/mapbox/real-changesets/production/133929698.json has open: true yet

Actually all wheelmap_visitor changesets, even the working ones, are still flagged as open: true and don't have a closed_at timestamp in the S3 JSON. For example this single object upload. zero bbox, one hour 134041593.json (OSM) and the ones in the second "complete" table above. The changes_count is always correct though.

I checked and the overpass query we execute is the same as achavi does

Probably not exactly the same? Both achavi and the changeset-map fallback queries use the full created_at to closed_at time range of the changeset. If that query fails, there will be no changes, as in #650. So I wonder why the cases in this issue do have some changes.

The setup described in the geohacker diary uses Augmented Diffs:

Overpass offers augmented diffs between two timestamps that contains current and previous versions of each feature that changed in that period. We put together an infrastructure that queries Overpass minutely, prepares changeset representation as a JSON, and stashes them on S3.

Those Overpass minutely augmented diffs can be queried either by sequence id (like OSM minutely diffs): http://overpass-api.de/api/augmented_diff?id=5546258 or using a query (see http://overpass-api.de/api/augmented_diff?id=5546258&debug=yes):

[adiff:"2023-03-30T20:33:00Z","2023-03-30T20:34:00Z"];
(
  node(changed:"2023-03-30T20:33:00Z","2023-03-30T20:34:00Z");
  way(changed:"2023-03-30T20:33:00Z","2023-03-30T20:34:00Z");
  rel(changed:"2023-03-30T20:33:00Z","2023-03-30T20:34:00Z");
);
out meta geom;

So basically also an adiff query like in achavi and changeset-map, but world-wide, without a bbox, and with a time range of one minute.

The osm-adiff-parser would then group all changes within a minute by changeset and update the S3 JSONs of those.

In this process, the bbox or duration of individual changesets doesn't matter at all. Which is why I considered this process superior to individual changeset queries. Issues like #650 and this shouldn't occur.

Therefore I wonder if this setup has changed and why?

nrenner avatar Mar 30 '23 21:03 nrenner