Nominatim
Nominatim copied to clipboard
Remove unnecessary Tiger data
The address coverage in the US is constantly increasing. It would be nice to be able to remove the less reliable Tiger data where coverage in OSM is complete.
see also this Help question.
(Jumping into this as OP of the Help Question)
I would think it would be somewhat ideal for there to be some OSM-data-driven method to flag Nominatim to ignore (or purge?) the Tiger address data. Given that OSM would have to have the data in the first place, this seems like a fairly good place to start. That said, I freely admit I'm not exceptionally familiar with the osm->pgsql manipulation or some of the methodologies & reasons behind the schema, but I did I have one idea that seemed fairly plausible.
It appears as though osm2pgsql tends to eat (drop) unusual non-standard addr:* tags, as well as tiger:* tags, but if we were to either alter the default osm2pgsql behavior and/or use a new tag on an OSM way (eg tiger:address-override => yes) we would have that value available on the row in the placex table (in the hstore) and can either a) drop entries from the tiger data table that are children of that placex row on update (import of Tiger data would need to be updated as well), or b) modify the SQL used when searching to ignore tiger data when the parent placex row has that tag in the hstore.
The potential downside is a lot of new tags tagging ways all over the US that would need to be stored in Postgres just because the Tiger data is a mess. On the other hand, it would allow for a maproutlette-esque project to update address info in the US.
@lonvia Is that something for index time (utils/tigerAddressImport.py
) or query time? E.g. negative list of county fips ids.
I was thinking of not importing any data for the counties into the Tiger tables in the first place.