Method for deduplicating supplemental data sets
When importing a dataset of e.g. NaviLens-enabled bus stops, some of those data points will be redundant to bus stops already marked in OpenStreetMap. The proper way to deduplicate this is probably to define some distance threshold, e.g. does a bus stop within 5m already exist in the database. If true, add the proper tags (e.g. navilens=true) to the existing point, rather than creating a new record in non_osm_data.
As flagged in #144.
I believe this should also apply to any point of interest e.g. supermarkets, places of worship, etc. That way, if we get non Open Street Map data for these types of POIs, and someone updates Open Street Map in the future, then the Non-OSM version gets replaced with the OSM version. Something like that. There are edge cases that might be tricky. For example, one service may list a place of worship as "Church" while another service may list the same place with the denomination e.g. "Catholic Church". We know that they are the same place because they share the same name and they are at the same location.
I think the ideal way to solve this would be for Navilens to contribute directly to OpenStreetMap. It would benefit OSM because e.g. the bus stop data would be more complete. It would work well for Soundscape Community, but also for other navigation apps. It would be good marketing for Navilens if the tag was a URI pointing to the code on their web site because other OSM apps could show a link.
Deduplicating single points like bus stops could work, although I imagine there will be edge cases such as when there are two very close together. Buildings could become a lot more complicated because they are usually an area rather than a point, possibly with a list of entrances where as the navilens code would be a specific point.
Thanks, but if you could find a way if Navilens can already be represented on Open Street Map, given the tools available, is there a way that we could add the info to OSM ourselves as a head start? Or should we open up another project like a Navilens tool for OSM? Maybe I could pitch this idea directly to OSM because I've contributed to OSM last year by adding new places to my own city map.
I think if they agree it is a good idea, someone from Navilens should start a discussion on the OSM forum to get agreement about a Navilens tag.
I just published an entry in my user diary on OSM suggesting that they should implement a Navilens tag, sort of like a true/false or yes/no tag. I hope it gets noticed, but now I'll just wait patiently.
Here is the link to my user diary entry on Open Street Map. If we could create a tool that could automatically add tags for Navilens to existing OSM bus stops or add QR Code tags that are already used in OSM or something, we might get somewhere without worrying about deduplicating as much. https://www.openstreetmap.org/user/John%20Joseph%20A%20Gatchalian/diary/406296
Here is the link to my user diary entry on Open Street Map. If we could create a tool that could automatically add tags for Navilens to existing OSM bus stops or add QR Code tags that are already used in OSM or something, we might get somewhere without worrying about deduplicating as much.
Thanks for starting the discussion with the OSM community -- there is some useful insight there about the tag structure, which I hadn't put much thought into. It's certainly easiest to change now before we've deployed anything, even in our separate database.
Since it appears that some 90% of bus stops aren't marked in OSM for some relevant US cities, we'd largely need to add whole data points, not just annotate existing ones. Whether or not NaviLens can bulk-import this into OSM will depend on their data licensing terms.
I can try to look into this issue
Is there a way to get to the software for this issue because as far as I'm aware the OSM software is not integrated into the Soundscape guide dogs library yet
Is there a way to get to the software for this issue because as far as I'm aware the OSM software is not integrated into the Soundscape guide dogs library yet
The first pass for this task doesn't need to integrate with any existing Soundscape code. All you need is the NaviLens data set and a relevant subset of OSM data.
OSM data can be queried publicly without spinning up Soundscape's backend, which simply hosts a copy. Here's a query that should return all bus stops in San Antonio: https://turbo.overpass.private.coffee/?Q=%5Bout%3Ajson%5D%0A%2F%2F%5Bout%3Acsv%28%3A%3Aid%2Cname%2C%3A%3Alat%2C%3A%3Alon%3B%20true%3B%20%22%2C%22%29%5D%0A%5Btimeout%3A25%5D%3B%0A%28%0A%20%20node%5B%22highway%22%3D%22bus_stop%22%5D%28%7B%7Bbbox%7D%7D%29%3B%0A%29%3B%0Aout%20body%3B%0A%3E%3B%0Aout%20skel%20qt%3B&C=29.45335%3B-98.567361%3B10
Between that output and the CSV I shared with you, you should be all set to start from a blank Python script. Just ry to formulate an algorithm that classifies each NaviLens point as either present in OSM or not. Since there are approximately 10 NaviLens points for each OSM point, I'd expect all OSM points should match a NaviLens point, but 90% of NaviLens points should have no corresponding OSM point.