abstreet icon indicating copy to clipboard operation
abstreet copied to clipboard

Importing Berlin

Open jvolker opened this issue 5 years ago • 62 comments

I've used ./importer --oneshot=/ with the binary version and this (extracted) OSM file: http://download.geofabrik.de/europe/germany/berlin-latest.osm.bz2

The importer crashes with Unknown turn restriction no_entry:

- Running convert_osm on /Users/jerry/Downloads/berlin-latest.osm
Read /Users/jerry/Downloads/berlin-latest.osm (1,040)... 162.1816s
OSM doc has 5522394 nodes, 818090 ways, 14293 relations
processing OSM nodes (5,522,394)... 4.4821s
processing OSM ways (818,090)... 6.3400s
Relation 1653527 has unhandled member role inner, ignoring it
Relation 3455436 has unhandled member role inner, ignoring it
Relation 3455436 has unhandled member role inner, ignoring it
Relation 3455436 has unhandled member role inner, ignoring it
Relation 3455436 has unhandled member role inner, ignoring it
Relation 3455436 has unhandled member role inner, ignoring it
Relation 3455436 has unhandled member role inner, ignoring it
Relation 3455436 has unhandled member role inner, ignoring it
Relation 3455436 has unhandled member role inner, ignoring it
Relation 3455436 has unhandled member role inner, ignoring it
Relation 3455436 has unhandled member role inner, ignoring it
Relation 2107105 has unhandled member role inner, ignoring it
thread 'main' panicked at 'Unknown turn restriction no_entry', map_model/src/raw.rs:373:18
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
dropping Timer while doing progress processing OSM relations, due to panic?

jvolker avatar Jul 02 '20 16:07 jvolker

The OSM importing code is extremely strict about stuff I haven't seen before, to prevent regressions when converting regularly used maps. This one is easy to fix (edit map_model/src/raw.rs:373) if you've built from source. You're bound to hit a few more issues though. I'll download the Berlin map and fix whatever I encounter.

dabreegster avatar Jul 02 '20 16:07 dabreegster

The entire Berlin area is huge. Could we start with a smaller area? Not neighborhood small, maybe the general city center area, just not 1.1GB. :) I use geojson.io or geoman.io to draw polygon boundaries.

dabreegster avatar Jul 02 '20 16:07 dabreegster

Wow, thanks for doing this.

Sorry. I didn't know how much this file covered.

Could we start with a smaller area?

Of course, how about this for the core city centre:

"coordinates": [
          [
            [
              13.361434936523436,
              52.49532344352079
            ],
            [
              13.432674407958984,
              52.49532344352079
            ],
            [
              13.432674407958984,
              52.53199183474197
            ],
            [
              13.361434936523436,
              52.53199183474197
            ],
            [
              13.361434936523436,
              52.49532344352079
            ]
          ]
        ]

and this for a larger part of the city which is surrounded by the circle train:

"coordinates": [
          [
            [
              13.271141052246094,
              52.45935663683681
            ],
            [
              13.47644805908203,
              52.45935663683681
            ],
            [
              13.47644805908203,
              52.561116680853836
            ],
            [
              13.271141052246094,
              52.561116680853836
            ],
            [
              13.271141052246094,
              52.45935663683681
            ]
          ]
        ]

I can try to create those filter files.

jvolker avatar Jul 02 '20 16:07 jvolker

Filter files: berlin-polys.zip

I've tried another time with berlin-centre.poly but am still getting the same error.

jvolker avatar Jul 02 '20 16:07 jvolker

berlin

Thanks! I got the large one working. You can put this file in data/system/maps: https://www.dropbox.com/s/4uia1pk9aj1d8gd/berlin.bin?dl=0 My GPU half-melted during loading and my mouse was briefly unresponsive, but in the end, only took 20s to load. :)

I'll commit a fix in a moment that deals with almost all of the issues. There are a few intersections where seemingly different combinations of roads wind up with exactly the same turn geometry, which breaks some assertions. I want to figure out why that's happening instead of relaxing that code permanently.

dabreegster avatar Jul 02 '20 16:07 dabreegster

parking Interesting: I think Berlin is the first place I've run across with OSM parking tags filled out somewhat. It's been quite painful mapping them in Seattle. Odd that the coverage in Berlin is so spotty; IIRC from a prior visit, there's definitely more than just these areas

dabreegster avatar Jul 02 '20 16:07 dabreegster

Wow! That's amazing. Thanks a lot for your quick help.

Unfortunately, abstreet is crashing when trying to load the Berlin map. output.txt contains:

Loading map ../data/system/maps/berlin.bin

Reading ../data/system/maps/berlin.bin: 0/149 MB... 0.0005s

../data/system/maps/berlin.bin is missing or corrupt. Check https://github.com/dabreegster/abstreet/blob/master/docs/dev.md and file an issue if you have trouble.

invalid value: integer `512`, expected variant index 0 <= i < 8

Interesting: I think Berlin is the first place I've run across with OSM parking tags filled out somewhat.

That's great!

jvolker avatar Jul 02 '20 17:07 jvolker

I found what I think is a problem with Bayernring in OSM:

  • https://www.openstreetmap.org/way/722013503
  • https://www.openstreetmap.org/way/722013169 The first one is an overlapping road of the second. I think the second one needs to be split at the point where the road attributes are different. Are you part of the OSM community in Berlin? If not, I can attempt a fix or raise the issue on the appropriate slack.

The crash is because the binary map format has changed since Sunday. That's the price of rapid iteration. :( If you can build the game crate from source, it'll handle the new format. If not, https://github.com/dabreegster/abstreet/suites/866601914/artifacts/10031908 is a Mac binary from last night that should work

dabreegster avatar Jul 02 '20 17:07 dabreegster

The new Mac binary is working with the Berlin map. Fantastic! Thanks so much.

Are you part of the OSM community in Berlin?

No sorry, I'm not. But I'm part of CityLAB Berlin. I should introduce myself properly in the next few days. And maybe my colleagues can answer some of your questions on the Berlin data. Let me know if you have any more questions.

Thanks so much! 🙏 That was the quickest support.

jvolker avatar Jul 02 '20 17:07 jvolker

About the parking data, my colleague @Lisa-Stubert mentioned an official data set from 2014 that contains this data in some format. It might be useful to fill in the gaps on OSM.

FIS-Broker > Straßenbefahrung > Parkfläche > WFS address

image

The data set contains polygons. @Lisa-Stubert mentioned we might need lines instead.

jvolker avatar Jul 03 '20 11:07 jvolker

The direct link to WFS address seemed broken, and I couldn't find Parkfläche on the first page. Is there a different direct link?

Seattle has a perhaps similar dataset, blockface. I'm matching that to the nearest side of each road here. We could do something similar for this dataset.

Ideally the data winds up in OSM, so anybody can use it, and it can be maintained more easily. OSM has strict rules around importing data automatically. Regardless of an import, theres a tool in A/B Street to update parking tags in OSM a little more conveniently than other editors.

dabreegster avatar Jul 03 '20 16:07 dabreegster

The direct link to WFS address seemed broken, and I couldn't find Parkfläche on the first page. Is there a different direct link?

I'm not familiar with the workflow yet, but a former colleague has written this guide on how to use that data source: https://lab.technologiestiftung-berlin.de/projects/fisbroker-to-qgis/en/

Apparently you can also load it via command line (with GDAL installed): ogr2ogr -f gpkg Parkflächen.gpkg WFS:"https://fbinter.stadt-berlin.de/fb/wfs/data/senstadt/s_Parkflaeche"

I'm going to get back to this next week.

jvolker avatar Jul 03 '20 18:07 jvolker

https://fbinter.stadt-berlin.de/fb/index.jsp?loginkey=zoomStart&mapId=k_StraDa@senstadt&bbox=383640,5815972,386377,5817593 this is the direct link to the data set of the "Straßenbefahrung". It contains all possible areas and objects of road traffic for Berlin. The dark grey areas are the parking areas. They can only be downloaded via the WFS service @jvolker posted.

Lisa-Stubert avatar Jul 06 '20 08:07 Lisa-Stubert

I've just tried to load the berlin map in A/B street compiled from source on Windows and received this error:

switch map...
Wrote ../data/player/camera_state/montlake.json
load map...
Loading map ../data/system/maps/berlin.bin
Reading ../data/system/maps/berlin.bin: 0/149 MB... 0.0001smemory allocation of 14934484003840414750 bytes failederror: process didn't exit successfully: `C:\Users\PC\Desktop\AB-Street\abstreet\target\release\game.exe` (exit code: 0xc0000409, STATUS_STACK_BUFFER_OVERRUN)

Do you have any idea what could cause this? Thanks.

jvolker avatar Jul 07 '20 17:07 jvolker

The binary format of the map changed yesterday.

I can regenerate the Berlin map and give you a new dropbox link in the next ~30 mins

dabreegster avatar Jul 07 '20 18:07 dabreegster

https://www.dropbox.com/s/2l2muvha0dubuvp/berlin.bin?dl=0

The simulation crashes 3m58s in. I can investigate soon; I suspect this is the same as https://github.com/dabreegster/abstreet/issues/143#issuecomment-654488564 -- there are some bus stops mis-tagged in OSM. I'll improve the error message to find these better, or filter out the invalid ones.

dabreegster avatar Jul 07 '20 18:07 dabreegster

Fixed Berlin map: https://www.dropbox.com/s/2l2muvha0dubuvp/berlin.bin?dl=0

The possible bus route issues:

Route Bus M76: U Walther-Schreiber-Platz <=> S Lichtenrade has two bus stops seemingly out of order somewhere on OriginalRoad { osm_way_id: 43238974, i1: OriginalIntersection { osm_node_id: 27433695 }, i2: OriginalIntersection { osm_node_id: 27434791 } }
Route Bus 277: S+U Hermannstraße <=> Marienfelde, Stadtrandsiedlung has two bus stops seemingly out of order somewhere on OriginalRoad { osm_way_id: 391673769, i1: OriginalIntersection { osm_node_id: 3726792172 }, i2: OriginalIntersection { osm_node_id: 3708789255 } }
Route Bus N50: U Tierpark <=> Buchholz-West/Hugenottenplatz has two bus stops seemingly out of order somewhere on OriginalRoad { osm_way_id: 626027608, i1: OriginalIntersection { osm_node_id: 5910363710 }, i2: OriginalIntersection { osm_node_id: 2519032893 } }
Route Bus X54: S+U Pankow <=> U Hellersdorf has two bus stops seemingly out of order somewhere on OriginalRoad { osm_way_id: 4783453, i1: OriginalIntersection { osm_node_id: 30625193 }, i2: OriginalIntersection { osm_node_id: 2101949795 } }
Route Bus N5: S+U Alexanderplatz <=> Hellersdorf, Riesaer Straße has two bus stops seemingly out of order somewhere on OriginalRoad { osm_way_id: 139877314, i1: OriginalIntersection { osm_node_id: 270718978 }, i2: OriginalIntersection { osm_node_id: 7312391564 } }
Route Buslinie M29 has two bus stops seemingly out of order somewhere on OriginalRoad { osm_way_id: 451544125, i1: OriginalIntersection { osm_node_id: 294494421 }, i2: OriginalIntersection { osm_node_id: 845075600 } }

These stops are just skipped for now. Not high priority to figure out what's wrong here.

dabreegster avatar Jul 07 '20 19:07 dabreegster

Thanks. It's working again.

jvolker avatar Jul 08 '20 08:07 jvolker

Just checking, are you blocked on me for anything besides importing and matching parking data? (which I'm not likely to get to anytime super soon)

dabreegster avatar Jul 11 '20 23:07 dabreegster

Thanks for asking @dabreegster. It seems the most important thing is to get trip data to get a more realistic simulation. We are currently looking into data sources and are going to post them here.

jvolker avatar Jul 13 '20 15:07 jvolker

Hey Dustin!

So we've done some brainstorming and come up with some additional data sources that might be useful for the project. Maybe you can take a look and give us an indication which data sources you see as more or less useful, or what's still missing?

The biggest thing we don't have right now is any sort of traffic pattern model like the SoundCast model. It's unclear to me to what extent this data exists in Berlin at all, and to the extent it does, it's not available as open data and we currently lack the contacts to rustle up that data. Obviously we'll keep exploring that, because it's something we're very interested in, but I don't know how soon we would have something concrete to offer there, if that even happens at all.

So what do we have?

  • Public transit data (GTFS): https://daten.berlin.de/datensaetze/vbb-fahrplandaten-gtfs (available as a regularly-updated upload; there is also an API, but it requires first contacting the local transit operator to get access)
  • Car traffic speed data: https://movement.uber.com/explore/berlin/speeds/query?dt%5Btpb%5D=ALL_DAY&dt%5Bwd;%5D=1,2,3,4,5,6,7&dt%5Bdr%5D%5Bsd%5D=2020-03-01&dt%5Bdr%5D%5Bed%5D=2020-03-31&ff=&lang=en-US (this is a dataset made available by Uber; shows average speeds over various road segments in both directions – could be useful as a proxy for traffic flows?)
  • Traffic volume data: https://fbinter.stadt-berlin.de/fb/index.jsp?loginkey=zoomStart&mapId=wmsk_07_01verkmeng2014@senstadt&bbox=388185,5818529,395105,5822530 (This is a very general dataset, available via WFS, showing estimated number of vehicles driving over major streets in a 24 hour period – this data is based on traffic counts originally gathered in 2014, so it's pretty outdated)

What might we have?

I looked at the dataset containing information on "blockfaces" to see what it contained (wasn't familiar with this term myself). Seems like it includes data on things like zones (e.g. loading and unloading zones, no parking zones, etc.) as well as metered parking/parking fees. Not sure what out of the blockfaces dataset is most relevant for A/B Street – if it's just about knowing where there is streetside parking, then the aforementioned "Straßenbefahrung" dataset is probably our best bet there. If you also want information on no-parking zones etc., I think we can possibly also get those out of the Straßenbefahrung dataset, but I'm not certain. We definitely won't be to get get good data on where parking is metered (and under what circumstances); there isn't currently a comprehensive city-wide dataset for that.

We may be able to rustle up some data on off-street parking – @Lisa-Stubert wanted to check out another large geospatial dataset we have to see if parking lots and parking garages are in it (although if they are, my guess is we won't have data on capacities, but maybe I'll be proven wrong).

Let me know what questions you have. If there's an acute need for data that has not yet been addressed, also let me know about that and we can go sleuthing...

tori-d avatar Jul 13 '20 15:07 tori-d

Thanks for doing this research!

  • The GTFS upload will be useful eventually. The transit modelling needs lots of work first; currently just a single bus is created per route, and it runs all day. No need for API access; the schedule information is used when building the map as a one-time process.
  • Speed and traffic volume could be helpful. I've seen research papers before talking about how to use volume data to infer a set of start/end points that would cause the measured volume counts. I have no familiarity with the algorithms for doing this; it'd be future work to implement them or find an existing tool that does it.
  • Straßenbefahrung: It sounds like this is the equivalent of the Seattle blockface data indeed. We could use it just to infer the existence of space along the shoulder of the road for street parking. The data in Seattle wasn't adequate; it says "no parking restrictions" on narrow roads that physically had no space for parking. So I use it as a first pass, manually look at areas of the map with odd road geometry, then fix upstream in OpenStreetMap using a tool.
  • There's no modelling of metered parking, resident-only restrictions, 2 hours max, etc yet. No immediate plans to figure this out either.
  • Offstreet parking with garages/lots could be helpful. We can estimate capacity from the geometry. Some lots are already in OSM: https://www.openstreetmap.org/way/669540789

I think the biggest thing that would help generate better trip data would be census. In the US, there are census tracts like this that chop up the area into pretty small regions, then assign a bunch of demographics to each tract -- mainly a population count. Then there's the American Commuter Survey that has questions like "how many cars owned per household" and "how many trips > 5 miles taken per day." The sample size is probably really small, but it's better than nothing, and I think the survey results are correlated with census tracts. So from this, the really simple trip generation model would take the number of people inside a tract, randomly distribute them to houses in that area (using OSM's residential vs commercial building tagging), guess a workplace/school/common destination for them using stats on how far people in that area travel, and so on.

I see https://en.wikipedia.org/wiki/Demographics_of_Berlin, and drilling down, https://en.wikipedia.org/wiki/Tiergarten,_Berlin is a small area with a population of ~12.5k. So I think we could use this! I haven't looked for where the wiki article pulls data; if there's a more consolidated format that ideally has the GPS coordinates of each neighborhood with this population count, that'd be a huge step forward.

dabreegster avatar Jul 13 '20 16:07 dabreegster

With respect to the Census Tracts / finding a corollary in Berlin:

So there are various administrative units that Berlin as a city is broken up into. A coworker actually did a nice breakdown of them (in English) here: https://lab.technologiestiftung-berlin.de/projects/spatial-units/en/. The example you found, Tiergarten, is an example of an Ortsteil/locality. Of the top of my head, I'm not actually sure what the smallest possible unit is (there are a few different systems for breaking up the city that are used in different concepts, so it's not necessarily straightforward to arrange all of the units from smallest to biggest. I guess what we want is the smallest possible unit for which we also have demographic data?

I think the smallest unit for which we would have overall population data is a Planungsraum/Planning Area. The population data is available here: https://daten.berlin.de/datensaetze/einwohnerinnen-und-einwohner-berlin-lor-planungsr%C3%A4umen-am-31122018

We also have the geometry of the Planungsräume available in various geospatial formats: https://data.technologiestiftung-berlin.de/dataset/lor_planungsgraeume/en (I think this file is slightly outdated though, we'll have to update it first).

I don't know if we'll be able to find additional census-like data for that level of specificity, however. Within Berlin itself I haven't seen any sort of survey data around mobility habits.

@Lisa-Stubert found this dataset, which probably has what we want? https://daten.clearingstelle-verkehr.de/279/ It includes a regional level, where an area is broken up into a grid of squares that are at least 500x500m and have at least 500 residents in them. The data, as far as I can tell, would include information on the average trip length from a given cell, whether the people living there own cars, etc. Unfortunately, it is not available as open data and I suspect it is only available for a fee. Moreover, the data seems like it is likely to be fairly complex, so even if we were to acquire it, I think it might be beyond our resources at least to integrate it into the program.

I can keep poking around to see if I find anything else at the Berlin level, but at least right now, we definitely have population data for sub-units of Berlin.

tori-d avatar Jul 15 '20 10:07 tori-d

The spatial breakdown article is quite helpful! Thanks for finding the datasets. I think these will work for figuring out how many residents to assign to buildings. Would you mind sanity checking my interpretation of the population CSV -- Google Translate seems fine, but you never know. I found Körnerstraße in the KML, with spatial_name of 01011104. That matches RAUMID in the population csv. The E_E column is total residents, so that's 4759 in that area, right?

The 500x500m trip length dataset may be helpful, but if it's not open, not a priority.

Here's a proposal for some next steps:

  • I'll add Berlin to the set of maps I regularly convert when the map format changes. The larger boundary suggested here turned out a bit too big, so I'll start with the city core. If anyone wants to use geojson.io or geoman.io to draw multiple boundaries, just send the coordinates along. The regions can overlap slightly, they don't have to cover everything (so not a strict partition in the mathematical set sense) -- an example is:

Screenshot from 2020-07-15 08-05-01

  • I'll make the import pipeline grab the shapefile and population CSV, then emit a file that'll let us visualize the regions with the population count.

  • I'll start researching approaches to reasonably distribute the number of people in a region into individual buildings. @matkoniecz came up with some heuristics for guessing building sizes using OSM tags.

dabreegster avatar Jul 15 '20 15:07 dabreegster

First two steps done. The importer grabs the shapefile and the population CSV, then adds a num_residents attribute to the shapefile. Here's a way to quickly visualize it:

screencast

I've started including Berlin in the default binary release, because the city center is tiny (only adds 20MB). This'll hopefully make it easier to iterate without any of you needing to build from source or run the updater. I'll post the build here when it's done; github is having partial outages right now.

dabreegster avatar Jul 16 '20 18:07 dabreegster

Linux: https://github.com/dabreegster/abstreet/suites/925031782/artifacts/11329941 Mac: https://github.com/dabreegster/abstreet/suites/925031782/artifacts/11329942 Windows: https://github.com/dabreegster/abstreet/suites/924669830/artifacts/11322491

To follow the video above, you'll also need https://www.dropbox.com/s/pbovhfr6zw0s0ob/planning_areas.bin.zip?dl=0. Unzip so that data/input/berlin/planning_areas.bin exists

dabreegster avatar Jul 16 '20 19:07 dabreegster

The spatial breakdown article is quite helpful! Thanks for finding the datasets. I think these will work for figuring out how many residents to assign to buildings. Would you mind sanity checking my interpretation of the population CSV -- Google Translate seems fine, but you never know. I found Körnerstraße in the KML, with spatial_name of 01011104. That matches RAUMID in the population csv. The E_E column is total residents, so that's 4759 in that area, right?

Yep, this is the correct interpretation!

tori-d avatar Jul 17 '20 09:07 tori-d

Thanks a lot, @dabreegster.

Linux: https://github.com/dabreegster/abstreet/suites/925031782/artifacts/11329941 Mac: https://github.com/dabreegster/abstreet/suites/925031782/artifacts/11329942 Windows: https://github.com/dabreegster/abstreet/suites/924669830/artifacts/11322491

Those versions have a "trips between home and work" traffic setting. Is this already using the population count?

jvolker avatar Jul 22 '20 14:07 jvolker

Two quick observations:

  • I've noticed using that setting the area around "Kottbusser Tor" is a lot busier in reality.
  • The CPU is at 100% and the simulation very slow using that setting. I assume the city center area is still too big.

jvolker avatar Jul 22 '20 14:07 jvolker

I've noticed using that setting the area around "Kottbusser Tor" is a lot busier in reality.

Are there also unusually quiet areas? (for now Kraków has some weird hot spots, especially pedestrian hot spots and some unusually quiet areas, mostly as indirect results of issues that I already reported)

The CPU is at 100% and the simulation very slow using that setting. I assume the city center area is still too big.

For me it is also consuming plenty of resources with laggy simulation. Not sure whatever it is simply result of many trips, or is there something about traffic generator that triggers unusually poor performance.

matkoniecz avatar Jul 22 '20 14:07 matkoniecz