valhalla icon indicating copy to clipboard operation
valhalla copied to clipboard

Infer default speed limits

Open westnordost opened this issue 4 years ago • 20 comments

This issue ticket's purpose is to

  1. get feedback from you guys if inferring default speed limits is desired at all
  2. get feedback from you guys if the default speed limit data as provided is in a format you are able to make use of etc.
  3. finally, serve as a starting-area / feature request to implement this

Assumptions

The maximum allowed speed on any given road is an important factor to rate the costing of edges in the graph and travel time.

maxspeed coverage in OSM is sketchy and likely will always be: Currently, only about 12% of all roads have a speed limit tagged.

Additionally, maxspeed is just for cars. Other vehicles such as trucks, buses, motorhomes, cars with trailers or motorcycles will, depending on the road type, have different maximum speed limits that are often below the signed one.

Implicit speed limits

When speed limits are not signed, the default speed limits apply. (Maybe more so outside the United States. I heard that in the US, things are usually always explicitly signed.) Regardless of the reason why maxspeed is missing - nothing is signed or information is still missing - the actual speed limits can be inferred by looking at the legislation.

For example, in Hungary, the default speed limits are as follows

  • 20 km/h on a living street
  • 30 km/h on a bicycle road
  • 50 km/h on any other urban road
  • 90 km/h on a non-urban road, except for vehicles pulling a trailer, with a GVWR of 3.5t, buses or trucks for which it is 70 km/h
  • 110 km/h on motorroads, except for the vehicle types mentioned above for which it is still 70 km/h
  • 130 km/h on a motorway. For buses however, it's 100 km/h. For vehicles, pulling a trailer, with a GVWR of 3.5t or more and trucks, it is 80 km/h

So, if a road in Hungary is tagged with e.g. highway=living_street or motorroad=yes, we can easily infer the implicit speed limit from the above information. Hungary however is a rather easy example.

Sourcing implicit speed limits

The legislations as the one above have been collected over the years on the OSM wiki page Default speed limits in a well-researched (with links to the actual legislation) and machine-readable form. @ianthetechie and me have been working on a Python parser for this table that outputs the information in it as a JSON-file so that it can easily be consumed by (routing) software.

This is how it currently looks like: default_speeds.json

See how it looks for Hungary
"HU": [
    {
      "filter": "highway=living_street or living_street=yes",
      "name": "living street",
      "tags": {
        "maxspeed": "20"
      }
    },
    {
      "filter": "bicycle_road=yes or cyclestreet=yes",
      "name": "bicycle road",
      "tags": {
        "maxspeed": "30"
      }
    },
    {
      "filter": "{is_urban}",
      "name": "urban",
      "tags": {
        "maxspeed": "50"
      }
    },
    {
      "tags": {
        "maxspeed": "90",
        "maxspeed:bus": "70",
        "maxspeed:hgv": "70",
        "maxspeed:conditional": "70 @ (weightrating>3.5); 70 @ (trailer)"
      }
    },
    {
      "filter": "motorroad=yes",
      "name": "motorroad",
      "tags": {
        "maxspeed": "110",
        "maxspeed:bus": "70",
        "maxspeed:hgv": "70",
        "maxspeed:conditional": "70 @ (weightrating>3.5); 70 @ (trailer)"
      }
    },
    {
      "filter": "highway~^motorway(_link)?$",
      "name": "motorway",
      "tags": {
        "maxspeed": "130",
        "maxspeed:bus": "100",
        "maxspeed:hgv": "80",
        "maxspeed:conditional": "80 @ (weightrating>3.5); 80 @ (trailer)"
      }
    }
  ]

Inherent complexity

Already in the example for Hungary, the following becomes clear:

Filter-based mapping

There can not be a 1:1 mapping of highway=X to speed limit rules because

  • there is no (single) tag for "urban road". It may be sourced from other information such as source:maxspeed=HU:urban and similar tags, by looking at the location of traffic_sign=city_limit signs, whether or not a road is in- or outside of a respective city limit boundary or fuzzily by looking at tags such as lit=yes/no and similar (because roads outside of settlements are usually not lit)
  • sometimes different tags are synonymous in OSM
  • sometimes the legal road type matches multiple different tags on OSM
  • sometimes the legal road type depends on such things as whether a road is lit or not (e.g. United Kingdom), whether the oncoming traffic is segregated (e.g. Germany) or whether the road is paved at all (e.g. Georgia, US)

Instead, it must be based on filtering by OSM tags. For example, the default speed limit in Illinois for roads where the oncoming traffic is segregated and there are 2 or more lanes in each direction would be 65 mph. For interstates, it would be 70 mph. The filters (in pseudo-Overpass-Wizard, ~ meaning it would be a regex-match) for that would be:

  • oneway ~ ^yes|-1$ and lanes >= 2 and dual_carriageway != no -> 65 mph
  • ref ~ ^I and highway ~ ^motorway|motorway_link$ -> 70 mph

To have a parser for such pseudo-Overpass-Wizard syntax adds quite some complexity, I believe.

Rules for when multiple road types match

For example, a bicycle road is also an urban road, so both filters match for an urban bicycle road. Which speed limit applies? The wiki page explains the rules for which the right road type should be selected. This is not super complex, but still adds to it before one can make proper use of the data.

Reverse country/state geocoder

Not every motorway with a ref code starting with I is an US interstate highway. This rule may only be applied in the US. Thus, the router must know in which country/state any given road is located. (Should be no problem though if you use a database with geospatial extensions. I can give pointers.)

Advanced maxspeed and maxspeed:conditional parsing

The information in the default speed limits table is translated into OSM tags, including maxspeed:conditional and maxspeed:<vehicle type>. Thus, to interpret the limits correctly, it's necessary for the (router) software to also parse the information in these tags correctly. Whether this adds complexity or not depends on if Valhalla can already parse these, as these are normal OSM maxspeed tags.

Your Feedback necessary

So, the mentioned parser hasn't been worked on for quite some time because we need feedback from you. No point in creating something if it isn't used or doesn't even have the chance of being used. See the points at the very beginning of this ticket.

I may have some time in the near future to improve on this and also maintain the source data, so:

Do you think this data is useful? Can you work with this data? What is necessary to work with it? Do you have suggestions on how else this data could be represented? Do you have other questions?

westnordost avatar Sep 22 '21 19:09 westnordost

@westnordost my first question to you is which goal are you targetting by updating the speed limit information:

  1. trying to get more accurate routes/ETAs OR
  2. trying to have better coverage of speed limit information (eg to display as metadata)

if its the former this seems fairly well related to: https://github.com/valhalla/valhalla/issues/3021. i can give an update on the status there. we have written a scraper that uses the mapillary api to pull data and map match it to the road network. from there we measure the speeds on the types of roads those speeds were seen. we take the mean of those, by administrative area and use those when setting the speed. ive hand created some of those already and this is currently supported in the tile building process. the results are pretty good if the time of day that you are routing is during the day time and not at peak rush hour in a heavily populated city. basically its not good at extremes either high volume or super low volume. i think this is a good tradeoff.

if its the latter then yes i fully support making better guesses for speedlimit info when its not available. the trick will be to actually annotate it as our best guess rather than tagged and we may have a bit (pun intended) of trouble with that wrt backwards compatibility. maybe we could make it a config option though that people opt into and tagged can now mean tagged or inferred.

kevinkreiser avatar Sep 22 '21 20:09 kevinkreiser

I read through #3021, I didn't understand everything but basically your approach aims to focus on real-world speeds, not legal speed limits. I agree that this seems to be the more useful information. But I believe it must be very current to be useful.

  1. trying to get more accurate routes/ETAs

I was primarily thinking of that. I know, it is just a maximum speed, not the actual speed. But people tend to drive the maximum allowed speed unless there is a congestion or there are other impediments like crossings, intersections, traffic lights, width and curvature of the road etc. The former is usually dependent on the day, time and location (not location = Hanover, but location = that particular road in Hanover). For the latter, there are tags for this so I hope this is already taken into account anyway.

So, handling congestions and traffic jams sound to me more like a thing that is best handled like Google does it (showing and taking into account live traffic).

Edit: Or well, that is one approach. When you have enough data, like Google does probably, you probably don't even need to have any smart algorithms that take into account traffic lights, width of road etc. pp. because you can fully rely on current (or yesterday's etc) data.

  1. trying to have better coverage of speed limit information (eg to display as metadata)

Not sure if I understand what you mean by that. As a user of a navigation software, I'd probably not want to be notified of the (assumed) speed limit of a road in the absence of any signs. People with a driver's license should be aware of the default speed limits. The worst thing that could happen is that the navigation software displays a too high max speed (because the lower signed maxspeed hasn't been tagged in OSM yet). Motorways in Germany famously have no speed limit by default. That doesn't mean that many/most sections of the motorway are actually signed to have a reasonable limit.

Edit: I'd also add a point 3, which is speeds for non-normal-cars, such as trucks, buses, motorhomes, cars with trailers, goods vehicles of various weights, motorcycles etc. An approach based on mean traffic speeds cannot cover this. E.g. let's say you want a route from Hanover to Middletown but you have a trailer to pull with your car.

[...] making better guesses for speedlimit info when its not available

Most often, it is not even a guess but can be inferred at 100% certainty. E.g. a highway=unclassified in an urban area may have a speed limit of 50 km/h. But it may also be lower, in case the maxspeed information is simply missing. But if there is the tag source:maxspeed=HU:urban, we know for certain that there is no sign and the speed limit is 50 km/h.

westnordost avatar Sep 22 '21 21:09 westnordost

@westnordost in my opinion and experience real-world speeds will statistically beat legal speed limits in almost every situation when you are talking about route accuracy or accuracy of ETA. i think i fundamentally disagree with this statement:

people tend to drive the maximum allowed speed

you started to put some qualifiers on there and i think that yeah, the qualifier is that people actually travel as fast as they can (its why they normally ask the router for the fastest route) within the bounds they feel are safe, those bounds may be:

  1. to avoid crashing into heavy traffic in front of them
  2. to avoid getting caught on a speed camera or by a cop
  3. to avoid driving unsafely (when they can go as fast as they want because no one is around)

the speed limit captures number 2 pretty well but i wouldn't be so sure this factor dominates the other two. take motorways for example. i have a bunch of experience just looking at measured data for those classes of roads, and as you know they have better maxspeed coverage in osm. the measured speed for those edges in the graph is almost always faster than the max speed. same for rural areas for even lower class roads.

anyway all of the philosophical stuff aside; i would welcome you to take a crack at supporting more intelligent assignment of default speeds even if it uses a rule-based approach instead of a statistical one. i went the statistical route because i knew that it would fair well and be simpler. it would be completely fine for me to get your implementation in there alongside the statistical one and let the user choose which one they want to use or even use a combination of them. perhaps your implementation could completely replace the current default implementation and, if configured, the statistical one can overrule that as is done today. all that to say, let me know what other questions i could answer more specifically to help you make a decision about whether you want to implement it or not.

reading through the beginning of the issue i can at least say we already know what country and state/province/gliedstaat/canton/prefecture/.... the edge is in so at least that part is already supported. parsing all of the various tags so that we can properly follow the heuristics may be a decent amount of work because we necessarily drop any extra info as early in the data processing as possible. but to know for sure we'll have to get a more exhaustive list and plan it out. let me know how i could best help you think that through

kevinkreiser avatar Sep 22 '21 21:09 kevinkreiser

I agree on your rebuttal of my argument at point 2


I am very out of touch with C++ development and have zero knowledge about the Valhalla code base, so (so far) I have had no intention to actually implement the feature itself in Valhalla.

What I had in mind was to work on providing the data in a well-formed err... form. So, progress on researching the legislation on the wiki and weeding out issues both in the spec and the parser that finally produces the JSON. So for starters, I would like to get feedback about if the maintainers of this project (and other router software) think it makes sense to include this data in the first place (with the complexities involved) and if we are beyond this first question, if the format and form in which the data is made available makes sense. (I.e. also more generally, does it make sense that it is a python script? Would that reasonably well integreate in your build process?)


we necessarily drop any extra info as early in the data processing as possible

I see, so that means that the information, i.e. the filling-in of additional inferred maxspeed-rules would need to happen at the very beginning of the processing, before the OSM data is dropped. Then, only those inferred maxspeed-rules get carried over into the next processing step. Or alternatively, parse the JSON in the first step only to gather which tags should not be dropped because they are needed later for speed limit inferring. Looking at the table, most will probably not be dropped anyway: width, ref, lanes, surface, oneway, motorroad, ...

westnordost avatar Sep 22 '21 22:09 westnordost

Rather than re-inventing the wheel here, you may want to take a look at:

https://wiki.openstreetmap.org/wiki/Key:maxspeed:practical

Ways tagged with that will use it in preference to maxspeed when figuring out which speed value to use (primarily for the auto costing).

One shortcoming with many of these approaches is that Valhalla supports multiple travel modes - not just cars, but scooters, bicycles, and pedestrians. There's currently no way to specify separate "average speeds" for each of them.

danpat avatar Sep 22 '21 22:09 danpat

@westnordost im not sure what you mean when you talk about both a python script and some json. i guess you are saying the "rules" are in the json and the python script, given some input would select the rule that applies and return the speed? personally, i think we would just port the python to c++ and load the json directly. as for which attributes are dropped, we actually keep pretty much all of the above ones that you mentioned so those wouldn't be a problem. im thinking about more exotic tags that we could add to the parser but wouldnt keep in the final data (ie are dropped after parsing and normalizing).

kevinkreiser avatar Sep 22 '21 23:09 kevinkreiser

@danpat that was directed at Kevin's approach? Then it may be more on-topic to add this comment to #3021

@kevinkreiser

  1. The information is collected and maintained in this wiki page: https://wiki.openstreetmap.org/wiki/Default_speed_limits
  2. The wiki-page-parser https://github.com/westnordost/osm-default-speeds is written in Python and produces...
  3. ...this JSON file https://github.com/westnordost/osm-default-speeds/blob/master/default_speeds.json which is supposed to be the legal maxspeed data from that wiki page ready to be consumed by software

So, the wiki-page-parser needn't be part of any build process, one could also just fetch the JSON from somewhere. One could just optionally create the JSON oneself with this python script as part of the build process to ensure one always has the newest data from that wiki page.

westnordost avatar Sep 22 '21 23:09 westnordost

@westnordost ok so yeah all we'd need to do is ingest the json and use the tags of a given edge (way) to look up the speed. unless its using some very exotic tags i dont see why this would be difficult at all.

kevinkreiser avatar Sep 22 '21 23:09 kevinkreiser

Do you have some kind of filter syntax parsing (e.g. parse something like oneway ~ ^yes|-1$ and lanes >= 2 and dual_carriageway != no) already built in in valhalla or would this have to be created? (Or should the tag filter be supplied in another form that is more easily consumable?)

westnordost avatar Sep 23 '21 09:09 westnordost

@westnordost the way it works in valhalla is we parse osm and while we are doing that we normalize the data as much as possible into its final state. so we may look at 4 or 5 different *name* tags but in the end we keep only the ones that make sense and fit into our target schema. as a schema osm is very inefficient so as soon as possible we convert key value strings into integers and bits in well-formed structures that are predefined. this parsing and normalization happens partly in lua and partly in c++ (graph.lua and pbfgraphparser.cc). it seems to me from your question that i still dont quite understand how you envision it to work. this is what i understand:

  1. we use a python script to scrape the wikipage to create a ruleset in json
  2. we write some c++ to: a. load the json ruleset b. use the rules and the attributes attached to a way to assign a default speed

wrt to the example you put above with the overpass syntax we dont have some DSL to provide that we have c++ objects which let you do this:

way.oneway && way.lane_count >= 2 && !way.dual_carriageway

the latter property is not known in OSM datasets (although im sure there is some tiny number of ways that are tagged as such) and so we do not support it. we've always talked about deriving it but its a non trivial task. commercial datasets have this data well-marked.

kevinkreiser avatar Sep 23 '21 11:09 kevinkreiser

wrt to the example you put above with the overpass syntax we dont have some DSL to provide that we have c++ objects which let you do this: way.oneway && way.lane_count >= 2 && !way.dual_carriageway

There is a mistake in your translation: The filter checks if dual_carriageway is not "no". So either if it is unset or "yes" would be fine. It is written that way because a teeny tiny minority of dual carriageways are actually tagged like that, most are just oneway=yes but in case a oneway is explicitly set with dual_carriageway=no it should not be detected as a dual carriageway.


So, anyway. This means that you can not make use of the JSON right away, because you either need to parse that filter syntax, or translate that filter syntax to Lua or C++ code somehow - which is also a kind of parsing. But it is also not possible to adapt the syntax as given in the wiki to be like Lua-code because you rename the properties before they are processed.

westnordost avatar Sep 23 '21 11:09 westnordost

OK aside from the pedantic example, it seems the only reasonable thing would be to implement the whole thing on the Lua side. This has the upside of allowing processing before we normalize the tags but has the downside of being before we add things like administrative info. It also means we have to implement your filter syntax unless you write it in Lua to begin with

kevinkreiser avatar Sep 23 '21 12:09 kevinkreiser

How would Lua look like for the above example? Maybe the syntax to lua code could be done with a simple string replace? (and -> &&, append a "way." in front of every tag etc)

westnordost avatar Sep 23 '21 12:09 westnordost

I've been working on this, see

  1. https://www.openstreetmap.org/user/westnordost/diary/399412
  2. https://wiki.openstreetmap.org/wiki/Default_speed_limits

I now finished working on the parser that creates the easily-parseable JSON of the data in the table.

I set up that github project so that anyone can start the Github workflow to generate a fresh default_speeds.json. The one in the repo is just to showcase how it looks.

Next on I will work on a (Kotlin multiplatform) library to spit out maxspeed given a set of tags and a country (subdivision) code. Do you have any feedback on this JSON so far?

westnordost avatar Jul 26 '22 13:07 westnordost

@westnordost cool! the format seems very straight forward to me, the begining part of the file doesnt seem to useful but further down where you have country-state and then tags looks workable.

i dont know if you saw but i too finished a bunch of work that is very similar to this. rather than using data from the wiki though i used gps traces from mapillary and map matched them to find median speeds for a given highway tag and a given population density. valhalla can already consume this data (https://github.com/OpenStreetMapSpeeds/schema/blob/master/default_speeds.json) as an optional config item to the tile building process.

kevinkreiser avatar Jul 28 '22 17:07 kevinkreiser

Hm, interesting! Maybe I should rename "default speed limits" to include "legal" or something. Coming from the legislation, the legal concept of "suburban" is not really used anywhere except in Iowa and New Jersey. If I were to try to group road zones in a scale that works globally, I would use:

  • (residential: residential/business district are legal definitions in the US, everywhere else they do not exist. However, in many regions, there exist the concept of 30-zones (20 mph zone) which are pretty much only used to speed limit residential areas so in effect are very similar)
  • urban: everything else within city limits that are not through-highways (in most countries 50-60 km/h / 35 mph). "urban" or "built-up area" are legal definitions in pretty much all the world except in some US states
  • rural: everything not within a city or a settlement that are not inter-urban main roads
  • (inter-urban (or however you want to call them): main roads that connect settlements)

westnordost avatar Jul 29 '22 10:07 westnordost

Or, you could use the road types from default (legal) speed limits json to accurately subdivide the different roads per each country/subdivision and then map the mapillary common speed data on that to arrive at a more accurate mapping without having to try to find a global scale (urban/suburban/rural or anything else)

westnordost avatar Jul 29 '22 10:07 westnordost

yeah what i used to do the population based subdivision of the data was to use the road density metric that valhalla computes. its like a measure of how many linear km of road there is per square km of area. of course coastal areas are somewhat screwed up with this approach but it seems to work ok in general

kevinkreiser avatar Jul 29 '22 14:07 kevinkreiser

I am now pretty much finished working on it. It is a Kotlin multiplatform library, so you should be able to use it directly from C++ code. It is unreleased however, because I am looking for feedback:

See the README.md for extensive documentation what it does: https://github.com/westnordost/osm-legal-default-speeds

westnordost avatar Aug 02 '22 22:08 westnordost

I also added a web-demo now: https://westnordost.github.io/osm-legal-default-speeds

westnordost avatar Aug 07 '22 23:08 westnordost

And also the talk from the SotM is online: https://media.ccc.de/v/sotm2022-18524-inferring-default-speed-limits

westnordost avatar Sep 28 '22 21:09 westnordost

very nice, just watched it. it would be fantastic to fill our speed limit info in the tiles using the tool. i am somewhat fearful that we lose a lot of the tags by the time it is to the point in our software when we want to know the speed limit for sure which brings me to the idea that we should do it earlier in the pipeline (ie when we are doing tag parsing). the problem there is its not so convenient to bring in tags from the relations. so perhaps i should educate myself a bit on the tooling you have created to determine how we could best integrate it to get the most out of all of your hard work reading the reams of govt documents (eine äußerst unbeneidenswerte arbeit :wink:)

kevinkreiser avatar Sep 29 '22 00:09 kevinkreiser

Personally, I do not think that the relations are so important. Basically, I added that because @1ec5 mentioned that in the US, the preferred tagging practice is to denote state routes by membership in appropriate route relations. But in reality, those state routes will usually also have a ref tag added on each road segment within that state route as well - at least alone to have it render correctly on OSM Carto.

The biggest thing to worry about regarding accuracy of the information is to reliably determine if a road is in a built-up area or not (urban or rural). But you already have a heuristic to determine that which you can simply inject here https://github.com/westnordost/osm-legal-default-speeds#replacing-placeholders , so that's fine.

westnordost avatar Sep 29 '22 11:09 westnordost