osmand_map_creation
osmand_map_creation copied to clipboard
Add OA data protection
processing_2022-06-23T12:05:20.555361.log: 2022-06-25 10:04:12,929 writing osm file for ca/on/oxford-addresses-county.geojson 2022-06-25 10:04:15,981 writing osm file for ca/on/city_of_niagara_falls-addresses-city.geojson 2022-06-25 10:04:21,356 writing osm file for ca/on/city_of_vaughan-addresses-city.geojson 2022-06-25 10:04:23,988 writing osm file for ca/on/city_of_cambridge-addresses-city.geojson 2022-06-25 10:04:26,452 writing osm file for ca/on/northumberland-addresses-county.geojson 2022-06-25 10:04:26,699 pg2osm fileinfo failure: Geometry error: Invalid location. Usually this means a node was missing from the input data.
Multiple other failures in log like this for other areas
all of us_az_coconino make it into pg db but none of it makes it to osm file. Unable to find bad values in db: searched for nulls and 0s, checked after converting using ST_X and ST_isValid all reads true.
Unable to find more info on error code in osmium source code.
It's possible error is caught late, so osmium is not the culprit.
None of the addresses in us_az_coconino have address numbers... program is acting normally since those are filtered as worthless. Need to confirm that other files are similar.
us_wa_skagit and us_id_bannock also have issues. Looking at OA shows it's bad data from them.
Currently, data is taken direct from OA each month for builds. Due to OA's lack of QA and fragile scraping system, there's regularly sources that will randomly come back with far less to no valid data compared to previous.
Adding some basic protection seems doable.
- pull oa data into temp postgres tables
- make sure it has number, street and geometry
- if current data has same or more valid(may need to tweak this criteria), use current
More advanced would be trying to match individual addresses and only add valid new ones.
Files will be upserted on hash into postgres so data will only grow over time.
This is blocked by #59 currently.
insert on conflict is erroring because us ri providence has a duplicate x5
{"type":"Feature","properties":{"id":"","unit":"","number":"5","street":"Dresser St","city":"","district":"","region":"","postcode":"","hash":"8f1963542bf8a05b"},"geometry":{"type":"Point","coordinates":[-71.457058,41.8198175]}}
maybe try inserting using where clause of hash not matching
added in 85c9263cc2fe02ade50975f0e1fc59ede6974fdb, addresses are only inserted and never dropped. Coverage will only get better over time. Duplicates were also gotten rid of to allow unique constraint since OA doesn't dedupe within in each file.