osmcha-frontend
osmcha-frontend copied to clipboard
Empty elements in S3 JSON for large bbox changesets
I'm submitting a bug report
Brief Description
Changesets with a large bounding box have an empty elements property in the real-changesets JSON file on S3, e.g.
https://s3.amazonaws.com/mapbox/real-changesets/production/133792960.json (OSMCha, OSM)
What is the current behaviour ?
When opening a large bbox changeset, a spinning wheel appears for three minutes, then the map and changes tabs are empty.
Technically, the client requests the cached real-changesets JSON from S3 to get the diffs for the changed features. As the elements property contains no features, the client sends a fallback adiff query directly to the Overpass API, which times out after 180 seconds.

What is the expected behaviour ?
OSMCha used to support large bbox changesets by processing world-wide minutely augmented diffs from Overpass API, as described in these posts:
- Preparing accurate history and caching changesets (geohacker diary)
- cached augmented diffs differ from on-demand adiffs - Overpass-API#346 (geohacker comment)
So I wonder why this is no longer the case for some time now? I suspect augmented diffs were replaced by individual adiff queries - like in the client - at some point? If so, what was the reason?
When does this occur ?
Seemingly for bounding boxes larger than about 5 "square degrees" (simple width * height from bbox coordinates). Probably also depends on other factors like how long the changeset was open for (created_at - closed_at time span).
How do we replicate the issue ?
- open Network tab in browser dev tools (F12)
- paste and submit large bbox changeset link in browser adress bar, like https://osmcha.org/changesets/133792960
- observe empty
elementsproperty in Response tab when clicking the "133792960.json" request for details (filter requests by "s3") - observe timeout result after 180s in Response tab when clicking the "interpreter?..." Overpass request for details (filter requests by "adiff")
- for more examples, click on the "JSON" link in the table below to see the empty
elementsproperty
Some recent examples:
| changeset | changes actual |
expected |
open for seconds |
bbox size deg² |
editor | ||
|---|---|---|---|---|---|---|---|
| 133928860 | OSM | JSON | 0 | 366 | 2 | 5 | iD 2.12.1 |
| 134140877 | OSM | JSON | 0 | 8674 | 215 | 10 | JOSM/1.5 (18678 de) |
| 133768685 | OSM | JSON | 0 | 3 | 3749 | 288 | rosemary v0.4.4 |
| 134177458 | OSM | JSON | 0 | 51 | 2 | 575 | JOSM/1.5 (18678 ru) |
| 134177840 | OSM | JSON | 0 | 54 | 1 | 3945 | iD 2.25.1 |
| 133792960 | OSM | JSON | 0 | 11 | 4031 | 4722 | rosemary v0.4.4 |
Largest working cases I found in my samples:
| changeset | changes actual |
expected |
open for seconds |
bbox size deg² |
editor | ||
|---|---|---|---|---|---|---|---|
| 133926419 | OSM | JSON | 5 | 5 | 1 | 25 | JOSM/1.5 (18678 de) |
| 134178093 | OSM | JSON | 3 | 3 | 1 | 33 | iD 2.25.1 |
| 133926888 | OSM | JSON | 27 | 27 | 1 | 54 | RapiD 1.1.9 |
Other Information / context:
I'm collecting issues related to Overpass and found three existing issues for failing large bbox changesets. These discuss the obvious client-side adiff query that runs into a timeout, but that is only a fallback.
Instead, I wanted to focus on the missing features in the S3 JSON and that this is really an issue of the server-side processing. Which seems not to be public (?), apart from the parsing part (osm-adiff-parser), so opening here.
- #529
- #548
- #629
@nrenner thank you so much for digging into this and flagging!
It's possible the server running Overpass for OSMCha has gotten a bit rusty and needs a bit of a kick. But yea, this would require logging things inside the AWS infrastructure that runs osm-adiff-parser, etc to figure out where these elements are getting dropped.
Thanks really for the detailed report - we should hopefully be able to follow up on this and debug in a proper way soon.
@batpad thanks for the quick answer!
It's possible the server running Overpass for OSMCha has gotten a bit rusty
Before making any bigger changes, it might be worth considering alternatives to the current setup. I'm planning to open a separate issue for that.
As an example for checking minutely augmented diffs (see my comment in #651), we can use the empty 134177840.json (OSMCha, OSM), which was open for one second: "created_at":"2023-03-27T13:22:17Z","closed_at":"2023-03-27T13:22:18Z".
The corresponding sequence id for that minute is 5541507.
Querying and parsing the augmented diff for that sequence returns the expected 54 changes:
curl "https://overpass.osmcha.org/api/augmented_diff?id=5541507" \
| zx -e "import parser from 'osm-adiff-parser'; let xml = await stdin(); parser(xml, null, (e, json) => { console.log(JSON.stringify(json['134177840'], null, 2)); })" \
| grep 134177840 | wc -l
The query only takes five seconds, so all good and no bbox involved whatsoever.
The geohacker diary says
The augmented diffs are also cached on S3.
It might be interesting to check the contents of that cached sequence (maybe some 5541507.xml or so?). Are they public?
@nrenner from comments from @geohacker in the diary post:
The state of the latest augmented diff is in a file called latest, like https://s3-ap-northeast-1.amazonaws.com/overpass-db-ap-northeast-1/augmented-diffs/latest.
You can request for an augmented diff this way: https://s3-ap-northeast-1.amazonaws.com/overpass-db-ap-northeast-1/augmented-diffs/2409184.osc
Not sure if this gives you what you're looking for exactly.
Oh, thanks! I hadn't looked in the comments.
Unfortunately the latest call gives me the sequence 2554267 and that is from 2017 (https://s3-ap-northeast-1.amazonaws.com/overpass-db-ap-northeast-1/augmented-diffs/2554267.osc). Later sequences seem not to be available there.
@nrenner @batpad this is the current S3 URL https://s3-eu-west-1.amazonaws.com/overpass-db-eu-west-1/augmented-diffs/
Thanks!
All changes there:
curl -s https://s3-eu-west-1.amazonaws.com/overpass-db-eu-west-1/augmented-diffs/5541507.osc \
| grep 134177840 | wc -l
54
So, if this was used, the query part isn't the problem. Maybe writing/updating the JSON fails for some reason or it gets overwritten later, but as there are no further changes in later minutely diffs for this changeset, I can't see a reason why.