Nominatim icon indicating copy to clipboard operation
Nominatim copied to clipboard

Remove unnecessary Tiger data

Open lonvia opened this issue 9 years ago • 4 comments

The address coverage in the US is constantly increasing. It would be nice to be able to remove the less reliable Tiger data where coverage in OSM is complete.

lonvia avatar Apr 15 '15 08:04 lonvia

see also this Help question.

lonvia avatar Apr 15 '15 08:04 lonvia

(Jumping into this as OP of the Help Question)

I would think it would be somewhat ideal for there to be some OSM-data-driven method to flag Nominatim to ignore (or purge?) the Tiger address data. Given that OSM would have to have the data in the first place, this seems like a fairly good place to start. That said, I freely admit I'm not exceptionally familiar with the osm->pgsql manipulation or some of the methodologies & reasons behind the schema, but I did I have one idea that seemed fairly plausible.

It appears as though osm2pgsql tends to eat (drop) unusual non-standard addr:* tags, as well as tiger:* tags, but if we were to either alter the default osm2pgsql behavior and/or use a new tag on an OSM way (eg tiger:address-override => yes) we would have that value available on the row in the placex table (in the hstore) and can either a) drop entries from the tiger data table that are children of that placex row on update (import of Tiger data would need to be updated as well), or b) modify the SQL used when searching to ignore tiger data when the parent placex row has that tag in the hstore.

The potential downside is a lot of new tags tagging ways all over the US that would need to be stored in Postgres just because the Tiger data is a mess. On the other hand, it would allow for a maproutlette-esque project to update address info in the US.

bdaroz avatar Apr 17 '15 04:04 bdaroz

@lonvia Is that something for index time (utils/tigerAddressImport.py) or query time? E.g. negative list of county fips ids.

mtmail avatar Oct 22 '17 09:10 mtmail

I was thinking of not importing any data for the counties into the Tiger tables in the first place.

lonvia avatar Oct 22 '17 10:10 lonvia