fixing-polygons-in-osm icon indicating copy to clipboard operation
fixing-polygons-in-osm copied to clipboard

News about the polygon fixing effort

Open joto opened this issue 8 years ago • 48 comments

I will be (mis-)using this issue to occasionally write about the current state of the effort to fix the (multi)polygons in OSM.

After a lot of preparation over the last half year or so (and frequent non-activity in between when I was busy doing other things), the real kickoff for this project was on February 14, 2017, when I posted the first challenges on Maproulette and made them public. Those challenges contained about 6500 self-intersecting building ways around the world split up into seven continent-sized areas. The community answered the call and got to work. Eleven days later, all of them were fixed.

While this was going on I worked on getting more challenges out the door. Next were about 1600 small landuse polygons from a single way with self-intersection I posted on February 21. They were all fixed in five days. I am amazed on how well and quickly the community responded, but there is muc more to do and not all taks will be so simple to fix. I deliberatly started with the easy tasks to get things going and give me and everybody else a chance to learn how this can work, before we start working on the bigger problems.

You can see the result of the effort in the massive drop in this graph (source):

intersection-stats-2017-03-02

That place were the number of errors was going up (February 23) was a massive import of broken data that got reverted a few days later. This shows another reason why it is good to fix those old problems: If the number of errors is much lower, anomalies such as broken imports will be seen more easily and we can fix them quickly.

On February 21 I also started another challenge: 1300 open rings from all around the world. As I write this, more than half of them have been fixed already. As these problems are (on average) much harder to fix, a slower pace then with the self-intersections was to be expected. But you can still the results of the effort in the graph (source):

open-rings-2017-03-02

I also started another challenge, to fix wrong roles on multipolygon relations. It is not doing as well as the others are, but it only has been going for a week now. And it has a lower priority, because those errors don't show up on the map directly. So it is good, mappers are working on the other tasks first. Ideally I'd still want the community to go through all those cases, because they show where the data is bad and problems often come in clusters. One error showing up in the challenge often means there are more around.

You can find all challenges and more information about how to help fix things here.

joto avatar Mar 02 '17 09:03 joto

By now the "Open rings" challenge is also fixed and the "Wrong role" challenge is well on its way, some people have been very industrious indeed!

Today I am posting four new challenges. Really, it is only one, but I have split it up again into four areas: Africa, the Americas, Asia + Australia, and Europe. Together these are about 2700 closed ways tagged building, landuse, or natural with self-intersections. So this is similar to some of the previous challenges, but it also includes larger polygons and some new tags. You can find the challenges here.

joto avatar Mar 04 '17 08:03 joto

The stats show that, after a very busy weekend, the work has slowed down some, but there is still progress. I don't know what's causing this, maybe it is just that the mappers do less work during the week. Or it maybe it is due to new problems being introduced. Over the last weeks when diving into the data I have noticed several current botched imports, some of them have already been reverted. They are not always easy to see before you know where to look. But this is a nice side-benefit of cleaning up the data: You see new problems better. Once we have fixed all the old problems, new problems will show up immediately in the graph.

I am finding those problems while digging into the data and preparing the Maproulette challenges. I am using a mixture of software: C++ programs I have written (osm-area-tools) create a list of all problems. I am using osmium-tool and other programs to do more ad-hoc filtering of OSM data (for instance to get all problems affecting specifically tagged objects). Then everything gets importet into PostgreSQL and then I look at the data in QGIS. This allows me to inspect the data "from all sides". And I can easily generate images like this:

heatmap

This is a part of Europe with duplicate nodes in red and a heatmap of the same duplicate nodes in blue. You can easily see the hotspots. A few days ago, one of those hotspots was in Prague. I contacted the local community and they have fixed the problems in a few days.

I am reaching out to other local communities as well, when I see particular problems there or just to get them involved. I'd love to get your help, too, contacting especially non-English speaking communities. I am happy to create special Maproulette challenges geared towards area problems in specific communities or do special data extracts or so.

joto avatar Mar 09 '17 10:03 joto

I have just rolled out another batch of challenges, this time with self-intersections in multipolygon relations. I have marked these challenges as "difficult", because some of them concern rather complex multipolygons. There are about 3300 multipolygons here and I have split up everything again, with about 200 to 600 tasks in each challenge.

joto avatar Mar 11 '17 21:03 joto

Let's see if JOSM start page does have some impact :-)

stoecker avatar Mar 11 '17 21:03 stoecker

For those of you who don't know what @stoecker is talking about. This is what the JOSM start page shows since yesterday:

josm-startup

Thanks Dirk!

joto avatar Mar 12 '17 08:03 joto

All the challenges I have created so far are about broken (multi)polygons. I haven't even started with old-style multipolygons yet. But that doesn't mean that others aren't busy working on that. I noticed a marked dip in the number of old-style multipolygons in the last days:

old-style-dip

That are several thousand multipolygons fixed! And if you look on the old-style multipolygon comparison map you can see where a lot of that happened. There are almost no old-style multipolygons left in Austria:

old-style-austria

Looking through the changesets, I found this is the work of nebulon42. Thanks @nebulon42.

If you want to do the same, just pick an area from the comparison map and get going!

joto avatar Mar 12 '17 09:03 joto

If you want to do the same, just pick an area from the comparison map and get going!

Yes! And Michael even posted a handy overpass query to find such multipolygons on osm talk: http://overpass-turbo.eu/s/nrg (or use the bbox-version: http://overpass-turbo.eu/s/nri)

Using this query and loading the result into JOSM/Level0/… makes it much quicker to find and fix such multipolygons.

@joto maybe you can put a link to this query somewhere on the site?

tyrasd avatar Mar 12 '17 15:03 tyrasd

In JOSM you can use File --> "Download from Overpass API..." with the Overpass query:

[out:xml][timeout:60][bbox:{{bbox}}];
(
   relation["type"="multipolygon"](if:count_tags()==1);
);
(._;>;);
out meta;

Then select a Bounding box and "Download".

After downloading run the JOSM validator and you see see warnings for all the problems in the selected area.

danfos avatar Mar 13 '17 16:03 danfos

@tyrasd @danfos Please keep this issue for news reports and open separate issues for other things. Thanks.

joto avatar Mar 13 '17 19:03 joto

Several people have already started fixing old-style multipolygons and posting some tips how to approach this using Overpass queries and JOSM. I have assembled this information and added some of my own and put it into a manual.

joto avatar Mar 14 '17 12:03 joto

I noticed that in the last days the number of segments with the wrong role is going up. I looked into this and I think this is a side-effect of the fixing effort. Wrong roles will only be detected for multipolygons that don't have any other problems, so fixing those other problems will lead to more wrong roles being detected. Even if you are very diligent when fixing something complex as those multipolygon relations, new errors will be made and some problems will slip through. That's not really a big deal, we'll detect them and fix them later.

But still, here is a a reminder: If you are fixing multipolygons, always also check the roles and correct them.

joto avatar Mar 14 '17 13:03 joto

A month ago I launched this effort to get the (multi)polygons fixed. Over 150 mappers have contributed so far, some with thousands and thousands of edits! Different fixes have been done, but the focus was on the self-intersections and, after this month, more than half of them have been fixed:

intersections-half-point

This is an awesome achievement!

joto avatar Mar 15 '17 10:03 joto

Some of the challenges posted have been quite difficult to fix. But here is an easy one that also allows mappers to help who are not that firm with relations: Ways that contain only a single node. Sometimes the same node is in the way multiple times. Some of these will be detected as polygons, because the first and last node are the same, so they are closed ways. They contain neither a proper line geometry nor a proper polygon geometry, so they need to be removed or fixed. Look at the details.

joto avatar Mar 15 '17 13:03 joto

The old-style multipolygon comparison map just got iD and JOSM buttons in the upper right corner that make editing the data a snap. (The buttons will turn red for a second if you are not at least in zoom level 15 or if JOSM isn't started or the remote control not available.)

And the challenges are almost all done again. I'll create some more for you soon...

joto avatar Mar 17 '17 21:03 joto

All challenges were done, so here is the next one. This is a bit more challenging to describe, but often pretty easy to fix thanks to the magic of JOSM. Get started at Duplicate segments in closed ways.

joto avatar Mar 18 '17 19:03 joto

I'll be at the FOSSGIS conference in Passau, Germany, next week. Catch me there to talk about the area fixing effort (or anything else). On Saturday there will be an OSM unconference where I am planning a session on the area fixing effort.

joto avatar Mar 18 '17 19:03 joto

Now that the fixing effort is making progress, I have been focussing some more on getting the word out.

I have contacted key software developers for OSM, especially the editor developers, usually through their ticketing systems. I am tracking this on our issue #23. I have also contacted several local communities and started talking with the HOT community to tell everybody about what's happening and make sure as many community members as possible are informed and involved in this process.

If you know about anybody else who should know about this effort, tell them, or tell me to tell them.

The feedback I got so far has been really positive, the only criticism I header really was that the documentation is too technical. So that is something we need to work on. I appreciate any help on that!

joto avatar Mar 21 '17 13:03 joto

While I was at the FOSSGIS conference you worked through all the challenges. Time to add some more. I have just added two new challenges. One is a Maproulette challenge (split up into 4 continent-sized bits) with another batch of Open Rings. This time there is no limit on the size of the multipolygons involved. Some are huge!

The other challenge is a bit different: There are many many multipolygon problems in South Korea. I have not shown them in previous challenges, because there are so many. I think it is better to fix these by going though them using the OSM Inspector. So I posted a challenge called Fixing multipolygons in South Korea.

joto avatar Mar 29 '17 15:03 joto

I know that's no news, but probably saves much time: I fixed some 100-1000 member polygons missing roles completely. Here a short guide how to do this in JOSM fast:

  • Open relation in editor - set all roles to outer (select all entries and enter "outer" in the box at the bottom) - close relation editor
  • Start validator - it will complain about roles which should be inner
  • Select all the related warnings and click "select" button in Validator
  • Open relation and apply "inner" role (all the elements are preselected)
  • Rerun validator to check if all is ok.

First fix all geometry related issues!

stoecker avatar Mar 29 '17 17:03 stoecker

There is some amazing work going on switching old-style multipolygons to new-style tagging. I have created this little movie showing the vanishing old-style multipolygons around the world. Many countries are already done!

old-style-map-animation

This movie was created from the same data you see on the comparison map overlay. It shows all nodes in all relations tagged type=multipolygon which have no other tags. So it doesn't show old-style multipolygons that have, for instance, a created_by tag or so. This was an oversight by me that I'll fix eventually. So, sorry folks, there are some more old-style multipolygons in those areas that we have to fix. But there are only about 6,000 of them or so. So nothing compared to the about 80,000 that were already fixed. (The statistics have the correct number of all old-style multipolygons.)

joto avatar Mar 31 '17 13:03 joto

Nice! I'd also suggest to evaluate the number of multipoly relations without an area tag (irrespective of which other tags they carry). A rough tag list is in #17 (last entry at the moment), which should cover the vast majority of cases (more details would be in the style sheets).

wolfbert avatar Mar 31 '17 17:03 wolfbert

Sometime in the last few days we marked the half-way point of fixing the old-style multipolygons. Fixing around 120,000 multipolygons took us about a month. Lets see how fast we can do the other half!

Here is how the map looked this morning:

old-style-mps-2017-04-10

Africa is done, Australia is done, huge parts of the other continents. Thanks to everybody who is helping out here! As I mentioned before the map was only showing old-style multipolygons that have no other tags except the type tag. This isn't quite all of them. Some have a created_by tag for instance. I have now corrected the map, so it shows some more multipolygons including some huge ones, mostly in Russia. So there are some more angry red dots on the map again, sorry. But the statistic was correct, so there isn't actually more work, it only appears so. :-)

joto avatar Apr 10 '17 09:04 joto

The currently running "Open Rings" challenges have been going slower than other challenges before. That was to be expected because there are some tasks in there that are really hard to fix. And many that can't be fixed at all without local knowledge.

In Maproulette users can mark tasks as "Too hard" or just skip them. Unfortunately those tasks will just show up again and again (for the same user or other users) which makes it difficult to get to the new tasks. I have now deleted all tasks marks as "Skipped" or "Too hard" from the challenges. This should it make easier to get to the other tasks that nobody has seen before.

joto avatar Apr 13 '17 07:04 joto

By now the "Open Rings" challenge for Europe is done, but there are a few hundred more to look at in the rest of the world. Would be great if we can get through this. Just don't spend too much time on any task, if it is not immediately obvious what the solution is, mark it as "Too hard" and move on.

If you don't like the "Open rings", I have added a new challenge called Duplicate Ways. This contains all cases where the same way is in a multipolygon relation twice or more times. That's always wrong and, in JOSM at least, easy to fix. The JOSM relation editor shows these ways with a reddish background.

joto avatar Apr 18 '17 13:04 joto

At the recent FOSSGIS conference we had a "OSM Saturday". I hosted a workshop about the area fixing effort. A video of my talk and the following discussion (all in German) is available for download and on youtube.

joto avatar Apr 22 '17 08:04 joto

Here is a new challenge for you. About 3000 building ways with spikes. Those are a subset of those cases that show up as duplicate segments in the statistics. Some of them are really easy to fix, just delete one node. But some of them are more tricky.

Again, I have split up this challenge into 5 sub-challenges for different areas: Africa, Americas, Asia + Australia, Europe, and, for the first time, one extra challenge for HOT activation areas.

(I just removed all the challenges and created new ones. The new ones should not have zero-length spikes in them (which happened if there were duplicated nodes) which was confusing.)

joto avatar Apr 27 '17 13:04 joto

Looking at the stats today you might have noticed a huge jump in the number of duplicated segments.

duplicate_segments_stats_2017-04-28

This jump is due to me being conservative before and not counting some duplicated segments that I was not sure were actual problems. I have changed this now to better show the number of problems at the price of some overreporting.

Oh, and before anybody freaks out about the huge numbers. The numbers reported here are segments (the connection between two nodes). Because most ways contain many segments, the number of closed ways or multipolygon relations affected is much lower. About 7000 ways and 21000 relations are affected.

joto avatar Apr 28 '17 16:04 joto

After exactly a month the "open rings" challenge is finally finished. This has been the most difficult challenge so far and there were many cases where a fix wasn't possible because the data needed to fix it is just not there. Boundaries can't be seen on satellite images for instance. But still we got nearly half of the about 10,000 cases fixed, so I count this as a success. I think we can't do much more here at the moment, but will keep thinking about how to address this in the future, possibly by involving the local communities more.

joto avatar Apr 30 '17 15:04 joto

The old-style multipolygon relations are history! In not even two months the OSM community cleaned up all of the nearly a quarter million relations:

old-style-stats-2017-05-04

This is much faster than I (and probably everybody else) had anticipated. There are a few old-style multipolygons around, some of them have no members at all, some only relation members (which isn't allowed for multipolygon relations) and some have been created in the last days. I expect that we will get new ones occasionally from editors and/or mappers who don't know yet, that they shouldn't do that, but that's not a big problem.

Here is an animation showing how the old-style multipolygons vanish bit by bit:

old-style-map-animation

So that part of the great (multi)polygon fixing effort is done. Huge thanks to everybody involved! But there are still geometry errors to fix.

joto avatar May 04 '17 14:05 joto

The "building ways with spikes" challenge is finished. Now on to the rest of the closed ways with spikes. Another about 2000 closed ways that have the same kind of spikes in them.

joto avatar May 07 '17 09:05 joto