Merge OpenAddress data into OSM data
OpenAddress data is far more complete than OSM data, and merging the two datasets makes OSM very usable. Data is available from OpenAddress.io in csv format, and if needed, I already have a script that converts it to .osm and merges with OSM data (.osm format) for US states.
When looking into openaddresses, we are talking about a map
streetname, housenr, ---> coordinates
At least in Finnish addresses, town and other hierarchy info is not there. In the server's geocoder, I am using hierarchy info. Namely, each coordinate is associated with city, region, country and so on. So, on import, we will have to determine those as well.
I want to make new importer for the data as it currently depends on an outdated version of a lib. So, it will be practical to work on it after that.
Ideally, what is needed is some kind of tool that will take OSM/OpenAddresses data and organize it in hierarchical manner allowing me to queue the hierarchy of objects and write them in my own format for https://github.com/rinigus/geocoder-nlp .
https://github.com/openaddresses/
It looks like they might already have tools that do what you need. Funny is that after I made my own oa2osm tool (in bash) they made their own in javascript. Theirs is most likely better, and may have the hierarchies defined for all global datasets.
I don't think I follow. Its also not sure how to merge the datasources and ensure hierarchy. But its a tricky problem. For hierarchies, we may have to use some extra tools, as I do now. Or maybe the one done for libpostal data import, or something else.
Where can I see an example of the hierarchy that's required? If I know exactly what the expectant data needs to be formatted as, I may be able to write a script to format openaddress data into said format.
Let me know if you need a Point-in-polygon service for this @rinigus, it can be useful to build hierarchies based only on lat/lon https://spatial.demo.geocode.earth
Just a FYI, there are 500 million+ OA records, so if this operation takes only 1ms per record it'll run for ~5,78 days 😱
Ive got a bunch of servers that I could split it up between (my boss would let me I think lol)
Quite an impressive number, will keep that in mind. Issue is that we have to import in one go as its composing a database that is later distributed to the users on mobile. So, I cannot postpone resolving hierarchies unless I change the format of the data. As for servers/service, I better keep it in house as long as possible. Imports are tedious, but under control so far and could be optimized further probably.
I will have to start working on geocoder again, but many things get in front, unfortunately, and I cannot promise at this moment any timeline for it.
PS: note to myself: Will have to look into Pelias as well, to learn from it.
Any update on this? As far as the prerequisite work to be done before this can be tackled? Just curious, not trying to rush
No progress so far, have been working on unrelated project. Hopefully I can get into some reasonable stage in that other project soon and then share time with the work on maps. With the maps, the rewrite of navigation for Pure Maps would take precedence to this as I have been researching already for that rewrite. So, by the look of it, not that soon I can start working on this.
Just a heads up, I went to check the OpenAddresses stuff again and they recently changed the main format of the data to JSON instead of CSV, so that might make things easier.
As the import is done via Nominatim now, I am closing it here. As soon as nominatim will add that source, I can add it to OSM Scout Server as well.