osm2pgrouting
osm2pgrouting copied to clipboard
Tags used for routing
Topic derived friom #64 Tags needed to detect "road" segments Tags needed to assign default lua values, that are on roads (highway, residential, paved) Tags needed to assign default lua values, that are on vertices (stop lights, barriers) I mention default Lua values, because I think that if osm_id 's for edges and vertices are stored in the data base the lua values can also be stored in separate tables and by a sequence of joins the new lua values can be set up.
http://www.lua.org/ Today OSRM using LUA scripts to define extraction profiles for cars, trucks, bicycles, etc. These are simple scripts the interact with the osrm-extract process to tell it what tags you want and what to do with the tags. It seems that this would be a good way to handle data extraction for osm2pgrouting. This would make it easy for users to change the profile of how data is extracted without having to change the source code. The current OSRM process as it stands would force us to run the extraction process multiple times for different profiles. This would be a good starting place and later as an enhancement we might want to extend the extraction process to handle multiple profiles in a single pass, may be by defining what data goes into what column of the table created. For example we might create multiple cost columns, like for pedestrians, car, bike, truck, etc and we might create multiple turn restriction tables for different profiles.
The current way of handling tags to be extracted is using this file: https://github.com/pgRouting/osm2pgrouting/blob/master/mapconfig.xml So you are suggesting to use lua scripts instead of that file. And actually you are suggesting to have a file for car, a file for truck a file for etc, and generate one table with cost for car reverse_cost for car, cost for truck, reverse_cost for truck etc. That is process one time the osm file and have multiple columns for it. And a table for the turn restrictions for car, another one for truck, another one for pedestrian. Did I understand your idea?
But then I ask, just as you have tables for the turn restrictions, one for car, another table for truck, another table for pedestrian, another table for bycycle etc, Why not instead of a column cost for car and reverse_cost for car, have a table of profiles that use hstore and a row that with the profile of a car, instead of a cost and reverse_cost for truck, have another row in profiles for the truck, etc.?
In any case, if lua is to be used, a first step is to have a default lua that will substitute mapconfig.xml that is the current default profile for the data loaded into pgRouting.
I think a lot of the design of whether or not you use the xml file or a lua script depends on how much of the osrm-extract code we can reuse as a starting point for osm2pgrouting.
Regarding "Why not instead of a column cost for car and reverse_cost for car, have a table of profiles that use hstore and a row that with the profile of a car, instead of a cost and reverse_cost for truck, have another row in profiles for the truck, etc.?", what are you proposing will be in these profiles? How will that work with the extraction process?
May be better way to think of the lua script is a filtering process, you have a large number of nodes and edges in OSM, and you need to apply a filter to the extraction process because most of them have nothing to do with routing and you may need to do some addition preprocessing of the extracted data, like compute the length of the edge, compute the travel time across that edge based on max speed of the edge or based on the profile or both.
The XML can define a list of tags to extract, but it is very limited when it comes to any computations. The lua script is external to the compiled code like the xml and can be edited by the user. Just like OSRM, we could have multiple standard profiles that user can use as is or copy and modify for their own purposes.
About How will that work with the extraction process?, interesting question, I need more information to be able to answer that, so I ask the following: What is the difference of extracting the the "routable" data of osm data when the osm data is stored in a database and the data stored in a file?
The routeable data in OSM is a very small percentage of the overall data. If your design is to just store the raw OSM data in the database as an hstore rather than building the normal node and edge tables then you need to explain how that will be queried or transformed into the tables that all the algorithms expect for input. You can of course do that but then that implies building a new set of tools specifically to handle OSM data in the database rather than handling this in osm2pgrouting and presenting the data in the normal table formats. Also using hstore adds another dependency pgrouting that is not needed today.
I guess, that the LUA configuration of OSRM contains quite an amount of information we may not want to care about.
But my idea with using the OSRM LUA configuration files was, that we can just take an OSRM configuration and use it for orm2pgrouting. So we can use their nice examples and settings (assuming that they are nice).
There is also a chance, that it helps with the reverse process to export into OSM format to re-import into OSRM from a pgRouting database, but this is nothing more than a guess.
Thinking about the existing LUA scripts used by OSRM, they appear to do a few basic things:
- select the tags that a user is interested
- compute the speed for the segment looking at if there is a speed or max. speed, or speed by class
- look at barriers
- look at traffic signals
We do not handle barriers in pgRouting, but we should basically break the network at the barrier so the edges on either side of the barrier are not connected.
Traffic signals are used to reduce the average speed on the segment where the signal is by some amount like 2 mph.
This allows you to easily change the costs if you are extracting for a car, person, bicycle, truck, tractor trailer, etc. It would also be possible to create a multi-profile extractor that does these computations and stores the results
So for example: Length units are in meters because in my country we use meters. Cost units are in seconds because I am using meters for length
id cCar rcCar cFoot rcFoot length
1 -1 -1 50 50 100 <---- this is a foot way
10 5 -1 -1 -1 100 <---- this is a highway and is a oneway
Length units are in miles because in my the country on the north use miles. Cost units are in seconds because to be consistent with the above table.
id cCar rcCar cFoot rcFoot length
1 -1 -1 50 50 0.062
10 5 -1 -1 -1 0.062
Form the example:
-
- The speed units don't need to miles/hr, nor the length be in miles nor the time be in hour. (the same applies to km/hr, I like more m/s becuase its less decimals, but meters/minute is ok also) Only three countries don't use the metric system. We have options:
a) the default profile follows the majority of the countires. b) the default profile follows the minority of the countires. c) the default profile follows whatever osrm is doing d) Can we capture the units somehow based on the user requirements.
-
- A lot of questions arise:
- how is the incremental loading going to be performed?
- Are they going to be loaded all at the same time?
- Can one be loaded first and then the other?
- If we load the Car first, then what happens with edge 1 that is not for car?, is it going to be added later, when Foot is loaded, is loading Foot going to take care of filling with -1 on the other cost/reverse_cost columns?
- if we load for foot what happens to edge 10 that is not for foot? (same as above)
- when we want to add daily changes, how can we candle?
A few comments:
About LENGTH
- I would assume that length can be always retrieved from the geometry, so if units are an issue, then I would just drop the length attribute and leave it up to the user.
- Length can be be pre-calculated at any time later
- It may be good to pre-calculate the length as it usually doesn't change and may speedup queries, if it's always calculated based on geometry.
About SPEED
- I'm wondering if it's enough to define speed for example by road class. Or does OSM data allow to set speed individually by road segment. If data contains actual speed information (not only max. speed), that might dramatically increase accuracy. But not sure this data is available in OSM in reality.
- I would let the use decide what the units are. Not sure if we need to know this to know, if we leave the cost calculation up to the user as well.
About Loading
- I would use an internal ID for data, but keep a reference to the OSM ID as an additional attribute. I hope this helps to later update data.
- If roads are not imported because they are not part of the profile, then they are just not there. If they are imported with a different profile later, this might just cause a new dataset (table).
There are tags in OSM for actual speed and they might be qualified by vehicle type. Retaining or using this information during extraction is useful for the different profiles. Also different profiles might want to extract different classes of edges, for example a walking profile might want to extract train tracks because you can walk along them.
I think a bigger question here is how do we want to represent the data in the database. For example:
- we could represent the data very much like OSM data with "id, geom, hstore" then setup a view that some how maps that to our traditional table layout
- we could represent the data as our more traditional table layout with columns for each profile
- we could create an edge table and each profile can be a separate attribute table that can be joined to the edge table
I think you need to address this point first because it will determine what processing needs to be done. Remember input->process->output, in this case the output will be the postgres tables.
I am pasting some stuff from OSRM's car LUA taken from here: https://github.com/Project-OSRM/osrm-backend/blob/master/profiles/car.lua: you can compare the with the foot LUA where the values change https://github.com/Project-OSRM/osrm-backend/blob/master/profiles/foot.lua The LUA language allows to do operations on data of unknown type, for example a value from osrm in a tag is "yes" in one record and 1 in other record. How/when they use/call this information is still obscure to me.
But here is basically what they do:
- They classify their tags:
access_tag_whitelist = { ["yes"] = true, ["motorcar"] = true, ["motor_vehicle"] = true,....
access_tag_blacklist = { ["no"] = true, ["private"] = true, ["agricultural"] = true,....
- based on tags values they define a speed
speed_profile = {
["motorway"] = 90,
["motorway_link"] = 45,
["trunk"] = 85,
-
they do it also for surfaces types values
["cement"] = 80, ["compacted"] = 80, ["fine_gravel"] = 80,
-
They have exceptions of speed limits
maxspeed_table = { ["ch:rural"] = 80, ["ch:trunk"] = 100, ["ch:motorway"] = 120,
-
Deciding use the LUA language:
local oneway = way:get_value_by_key("oneway")
if oneway and "reversible" == oneway then
return
end
- Notice that speed has units. They use LUA language that based on the information of speed stored in the tables to take decisions.
local function parse_maxspeed(source)
if not source then
return 0
end
local n = tonumber(source:match("%d*"))
if n then
if string.match(source, "mph") or string.match(source, "mp/h") then
n = (n*1609)/1000;
end
else
-- parse maxspeed like FR:urban
source = string.lower(source)
n = maxspeed_table[source]
if not n then
local highway_type = string.match(source, "%a%a:(%a+)")
n = maxspeed_table_default[highway_type]
if not n then
n = 0
end
end
end
return n
end
@woodbri I would just store the necessary attributes for routing and drop the rest. hstore is nice, but I think, if we keep the OSM ID as a reference to the original OSM data. Then someone can use osm2psql (supports hstore) to import the whole data and link between the tables.
There might be cases, where we have to split ways from OSM, so in that case multiple road segments may point to the same OSM ID.
@dkastl What is the goal? osm2psql or osm2pgrouting? If you just load the OSM to psql then you just have a big mess. Maybe it is useful to someone, but how are you going to use it for routing?
- How do I use it in a call to pgr_dijkstra(...)?
- How do I decide which edges to load?
- What are the costs for the edges?
- Do I need to node the edges? Which edges?
- Is the topology setup correctly?
- Why does OSRM transform the data?
Regarding how the data is represented in the database, has a lot to do with how it is going to be used. For pgrouting, all our functions expect data in some kind of structured table with required columns, and some other constraints like it is noded and has valid topology. If you can not easily present the data in this form then it is not useful for pgrouting. So at a minimum you need to be able to present the data through a view that does this if not a table and I don't think you can solve noding using a view. and some routing topology issues might not be solvable via a view.
How the LUA scripting works is like this. When a way is parsed out of the OSM data some of the attributes are bound to the LUA environment, then the script is run from C++ and the environment is updated by the script. Then C++ evaluates the resulting environment and decides what to do with the way. See: https://github.com/Project-OSRM/osrm-backend/blob/master/extractor/scripting_environment.cpp
A couple of nodes: (from data from #44)
<node id="40553436" lat="45.5164602" lon="-122.6424960" ... > (first node)
<node id="40611062" lat="45.5157476" lon="-122.6425043" ....> (second node)
Ways related to the node:
- a segment (from 1 node to another node) of Southeast 23rd Avenue
<way id="5529352" version="12" ......>
<nd ref="40553436"/> <<<<------- fist node
<nd ref="40501956"/>
<tag k="highway" v="residential"/>
<tag k="maxspeed" v="25 mph"/>
<tag k="name" v="Southeast 23rd Avenue"/>
</way>
- Another segment that of Southeast 23rd Avenue that needs to be split (the mayority of the streets are like this)
<way id="27401115" version="6" ....>
<nd ref="40553436"/> <<<<------- fist node
<nd ref="40611062"/> <<<<------- second node
<nd ref="40611065"/>
<nd ref="40610057"/>
<nd ref="40540401"/>
<nd ref="40460572"/>
<nd ref="1319678098"/>
<tag k="highway" v="residential"/>
<tag k="maxspeed" v="25 mph"/>
<tag k="name" v="Southeast 23rd Avenue"/>
<tag k="tiger:cfcc" v="A41"/>
<tag k="tiger:county" v="Multnomah, OR"/>
<tag k="tiger:name_base" v="23rd"/>
<tag k="tiger:zip_left" v="97214"/>
<tag k="tiger:zip_right" v="97214"/>
Southeast Belmont Street crosses Southeast 23rd Avenue on node 40553436
<way id="122480386" version="8" ....>
<nd ref="40553433"/>
<nd ref="1368545951"/>
<nd ref="40553436"/> <<<<------- first node
<nd ref="1410267936"/>
<tag k="bicycle" v="designated"/>
<tag k="cycleway" v="lane"/>
<tag k="highway" v="secondary"/>
<tag k="maxspeed" v="30 mph"/>
<tag k="name" v="Southeast Belmont Street"/>
<tag k="oneway" v="yes"/>
<tag k="sidewalk" v="both"/>
</way>
Southeast Yamhill Street also crosses the Southeast 23rd Avenue but in another node 40611062
<way id="5534373" version="11" ....>
<nd ref="40671914"/>
<nd ref="40671916"/>
<nd ref="40671918"/>
<nd ref="40611062"/> <<< second node
<nd ref="40537610"/>
<nd ref="40671923"/>
<nd ref="40571154"/>
<nd ref="40671926"/>
<nd ref="40547848"/>
<nd ref="1354223590"/>
<nd ref="40671936"/>
<nd ref="40659342"/>
<nd ref="40671939"/>
<nd ref="40612231"/>
<nd ref="40651685"/>
<nd ref="1368545928"/>
<nd ref="40538728"/>
<nd ref="40671947"/>
<nd ref="40511837"/>
<nd ref="40671950"/>
<nd ref="40493626"/>
<tag k="highway" v="residential"/>
<tag k="maxspeed" v="25 mph"/>
<tag k="name" v="Southeast Yamhill Street"/>
</way>
when I read the code int he link https://github.com/Project-OSRM/osrm-backend/blob/master/extractor/scripting_environment.cpp For example the lua function defined in car LUA, I dont see when its being used:
local function parse_maxspeed(source)
if not source then
return 0
end
local n = tonumber(source:match("%d*"))
if n then
......
@dkastl about: if we keep the OSM ID as a reference to the original OSM data. Then someone can use osm2psql (supports hstore) to import the whole data and link between the tables. I like this idea. its a data base, and links are joins, and then I don't need to have a big Europe table for Foot and another big table for Car
What ever we do, the car description in the xml file that is being read as the moment, either it can stay as xml, or as LUA data, the new version program should be able to work correctly and throw into the data base the table/columns that is filling now.
I'm not sure, but the C++ code, might be able to define some required script functions like parse_maxspeed() that it can call to get a value that it needs. Probably reading some LUA tutorial is better than relying on my overview of LUA these two links demonstrate calling C++ from LUA and calling LUA from c++:
http://gamedevgeek.com/tutorials/calling-lua-functions/ http://gamedevgeek.com/tutorials/calling-c-functions-from-lua/
Google: C++ lua tutorial
I think it is important: store way's OSM ID store nodes's OSM ID regardless if we give them a new node id or not (because pgRouting doesn't support BIGINT yet)
That way, if a user decides to have all the osrm data loaded in another table for, say, rendering purposes, then the user is going to be able to do joins between the nodes table, maybe the user renders using an aplication that captures nodes/edges, and that node might be in the routing table, so with a JOIN he can get the id (singular: for node) or ids (plural:nodes that belong to that OSM ID) we use.
Options: for this problem
- use xml to describe a car/foot/truck or whatever the user needs to describe
- use a lua data description to describe a car/foot/truck or whatever the user needs to describe
- right now we have one table with default cost, reverse_cost for the routing description the user needs, the user just needs to modify the xml file. And the minimum requirement is that we generate the Edge table as its being generated/used now.
- the extras: * Vertex table that holds the OSM ID is handy, but not done now * one table with columns for car/foot/bycicle ? * the extra description in another table
- we need to think we are using a database to store information, and a bad design is to have repeated information in different tables. I think a good design is useful. if we incorporate the extras.
Maybe this link deserves a deeper research, it talks about using lua in a database. https://www.mapbox.com/blog/osrm-using-external-data/
@cvvergara I totally forgot about this blog post. It's quite old already. As far as I understand this just means that you can manage some stuff in the database that you need for processing, but it still means that OSRM creates a binary format based on that, which is "pre-built".