osmscout-server icon indicating copy to clipboard operation
osmscout-server copied to clipboard

Merge OpenAddress data into OSM data

Open joshbowyer opened this issue 5 years ago • 10 comments

OpenAddress data is far more complete than OSM data, and merging the two datasets makes OSM very usable. Data is available from OpenAddress.io in csv format, and if needed, I already have a script that converts it to .osm and merges with OSM data (.osm format) for US states.

joshbowyer avatar Jun 03 '20 20:06 joshbowyer

When looking into openaddresses, we are talking about a map

streetname, housenr, ---> coordinates

At least in Finnish addresses, town and other hierarchy info is not there. In the server's geocoder, I am using hierarchy info. Namely, each coordinate is associated with city, region, country and so on. So, on import, we will have to determine those as well.

I want to make new importer for the data as it currently depends on an outdated version of a lib. So, it will be practical to work on it after that.

Ideally, what is needed is some kind of tool that will take OSM/OpenAddresses data and organize it in hierarchical manner allowing me to queue the hierarchy of objects and write them in my own format for https://github.com/rinigus/geocoder-nlp .

rinigus avatar Jun 06 '20 07:06 rinigus

https://github.com/openaddresses/

It looks like they might already have tools that do what you need. Funny is that after I made my own oa2osm tool (in bash) they made their own in javascript. Theirs is most likely better, and may have the hierarchies defined for all global datasets.

joshbowyer avatar Jun 10 '20 14:06 joshbowyer

I don't think I follow. Its also not sure how to merge the datasources and ensure hierarchy. But its a tricky problem. For hierarchies, we may have to use some extra tools, as I do now. Or maybe the one done for libpostal data import, or something else.

rinigus avatar Jun 10 '20 16:06 rinigus

Where can I see an example of the hierarchy that's required? If I know exactly what the expectant data needs to be formatted as, I may be able to write a script to format openaddress data into said format.

joshbowyer avatar Jun 10 '20 16:06 joshbowyer

Let me know if you need a Point-in-polygon service for this @rinigus, it can be useful to build hierarchies based only on lat/lon https://spatial.demo.geocode.earth

Just a FYI, there are 500 million+ OA records, so if this operation takes only 1ms per record it'll run for ~5,78 days 😱

missinglink avatar Jun 10 '20 17:06 missinglink

Ive got a bunch of servers that I could split it up between (my boss would let me I think lol)

joshbowyer avatar Jun 10 '20 17:06 joshbowyer

Quite an impressive number, will keep that in mind. Issue is that we have to import in one go as its composing a database that is later distributed to the users on mobile. So, I cannot postpone resolving hierarchies unless I change the format of the data. As for servers/service, I better keep it in house as long as possible. Imports are tedious, but under control so far and could be optimized further probably.

I will have to start working on geocoder again, but many things get in front, unfortunately, and I cannot promise at this moment any timeline for it.

PS: note to myself: Will have to look into Pelias as well, to learn from it.

rinigus avatar Jun 10 '20 18:06 rinigus

Any update on this? As far as the prerequisite work to be done before this can be tackled? Just curious, not trying to rush

joshbowyer avatar Oct 12 '20 01:10 joshbowyer

No progress so far, have been working on unrelated project. Hopefully I can get into some reasonable stage in that other project soon and then share time with the work on maps. With the maps, the rewrite of navigation for Pure Maps would take precedence to this as I have been researching already for that rewrite. So, by the look of it, not that soon I can start working on this.

rinigus avatar Oct 12 '20 05:10 rinigus

Just a heads up, I went to check the OpenAddresses stuff again and they recently changed the main format of the data to JSON instead of CSV, so that might make things easier.

joshbowyer avatar Nov 30 '20 14:11 joshbowyer

As the import is done via Nominatim now, I am closing it here. As soon as nominatim will add that source, I can add it to OSM Scout Server as well.

rinigus avatar Jan 27 '23 18:01 rinigus