pelias icon indicating copy to clipboard operation
pelias copied to clipboard

Support house numbers without a corresponding street name

Open JiriBalcar opened this issue 8 years ago • 30 comments

Hey team!

I was using your awesome geocoding engine when I noticed something interesting. Let me tell you more about it.


Here's what I did :innocent:

I did noticed that search do not return addresses in rural areas in Czech Republic. All the addresses have in common, that they are in villages without streets.

Some examples:

  • Dolní Čermná 314, CZ
  • Dolní Dobrouč 378, CZ
  • Kolešovice 303, CZ

Here's what I got :scream_cat:

I am not able to geocode/reverse geocode/autocomplete this houses.

image


Here's what I was expecting :sparkles:

I expect that even if the house does not have street, we should be able to geocode it.


Here's what I think could be improved :trophy:

I found that in both import pipelines for openstreetmaps and openaddresses, there is check if street exists. address_extractor.js and isValidCsvRecord.js

JiriBalcar avatar Sep 06 '17 12:09 JiriBalcar

Hi @JiriBalcar - thanks for the report!

I am linking this issue to your support ticket (Desk 955).

rmglennon avatar Sep 08 '17 23:09 rmglennon

We currently do not support street-less addresses. We will try to prioritize this feature in the next 6 months. Thanks for reporting it.

dianashk avatar Sep 12 '17 16:09 dianashk

Related to https://github.com/pelias/interpolation/issues/27

dianashk avatar Sep 12 '17 16:09 dianashk

Hi, I would like to ask if there was any progress with this issue?

JiriBalcar avatar Oct 30 '17 08:10 JiriBalcar

@dianashk do you have some high-level analysis what should be done to allow imports for streetless addresses? Could it be solved if for this addresses it would fill the street with city name? (at least for Czech Republic). I would like to help with this, but do not know where to start.

JiriBalcar avatar Nov 03 '17 07:11 JiriBalcar

@JiriBalcar, I think we probably would try to patch it by filling the street name with city name since it acts similarly in the case of the Czech Republic. We would also need to update the pelias-labels module to create the labels properly and not repeat the city name twice. Would you happen to have some time to contribute to the project? Let us know what we can do to help you get started.

dianashk avatar Nov 03 '17 13:11 dianashk

@dianashk I will try to create pull request for this fix. Should be the filling of city name to street only for Czechia or global?

JiriBalcar avatar Nov 03 '17 15:11 JiriBalcar

Any update of this feature? In Poland, there are also many villages without streets.

arqo123 avatar Oct 28 '18 10:10 arqo123

Can you please provide some example OSM ids and the corresponding queries you would like to match so we can better understand?

We currently only import OSM addresses where both addr:street and addr:housenumber tags are set.

I'd be interested in trying to support these streetless towns if possible, but we need more information about which countries this applies to and how common it is to have streetless addresses before we can investigate.

missinglink avatar Oct 28 '18 20:10 missinglink

Thanks for your response! Here you have a few addresses from Poland. I can find all of them with nominatim. In a next few days i will provide you informations about other European countries.

1* Address : Kaszowo 29 osmId : 410429784 query : Kaszowo 29 or 29 Kaszowo ( works in both situations ) nominatim result : https://nominatim.openstreetmap.org/search?q=Kaszowo%2029%20&format=json&addressdetails=1

2* Address : Liszki 7, 32-060 osmId : 215313396 query : "7 Liszki 32-060" or "Liszki 7 32-060" ( works in both situations ) nominatim result : https://nominatim.openstreetmap.org/search?q=%20Liszki%207%2032-060&format=json&addressdetails=1

3* Address : Liszki 425, 32-060 osmId : 260069348 query : "425 Liszki 32-060" or "Liszki 425 32-060" ( works in both situations ) nominatim result : https://nominatim.openstreetmap.org/search?q=425%20Liszki%2032-060&format=json&addressdetails=1

4* Address : Stawiec 37 osmId : 410291313 query : "Stawiec 37" or "37 Stawiec" ( works in both situations ) nominatim result : https://nominatim.openstreetmap.org/search?q=Stawiec%2037&format=json&addressdetails=1

Locality "Liszki" is not a small one. In OSM you can see that "Liszki" are not streetless. But streets have been added recently and probably all the buildings are not realated with them.

In my opinion the best way is quering it with post code. So it can be like [housenumber] [locality] [postcode].

EDIT 1 And addresses from other countries

5* Address : Chotča 123 Country: Slovakia osmId : 71712027 query : "Chotča 123"
nominatim result : https://nominatim.openstreetmap.org/search?q=Chotca%20123&format=json&addressdetails=1

6* Address : Habartice 81 Country: Czech Republic osmId : 840401654 query : "Habartice 81"
nominatim result : https://nominatim.openstreetmap.org/search?q=Habartice%2081&format=json&addressdetails=1

EDIT 2 I've import all the OpenAddress data into mongodb. Distinct query shows me, that in Poland we have 31491 streetless towns :< I can olso do this for a few other European countries if needed.

streetless_poland.txt

arqo123 avatar Oct 28 '18 22:10 arqo123

I can add some more from Czech Republic:

Address : Dolní Čermná 314 Country: Czech Republic osmId : 2453009527 query : "Dolní Čermná 314" https://nominatim.openstreetmap.org/search?q=Doln%C3%AD%20%C4%8Cermn%C3%A1%20314&format=json&addressdetails=1

Address : Kolešovice 303 Country: Czech Republic osmId : 2871145297 query : "Kolešovice 303" https://nominatim.openstreetmap.org/search?q=Kole%C5%A1ovice%20303&format=json&addressdetails=1

Address : Horní Čermná 174 Country: Czech Republic osmId : 2454122222 query : "Horní Čermná 174" https://nominatim.openstreetmap.org/search?q=Horn%C3%AD%20%C4%8Cermn%C3%A1%20174&format=json&addressdetails=1

Address : Jehnědí 40 Country: Czech Republic osmId : 2447918405 query : "Jehnědí 40" https://nominatim.openstreetmap.org/search?q=Jehn%C4%9Bd%C3%AD%2040&format=json&addressdetails=1

JiriBalcar avatar Oct 29 '18 06:10 JiriBalcar

Thanks for the examples, I've changed the title of this issue to better describe the problem.

Our data model is based on the idea that each address has a corresponding named way. This assumption is logical since the postal service needs to identify the route in order to gain access to the property for deliveries.

From what I understand of the discussion above, you're saying that in some European villages that the streets have not yet been named by the local government.

In these cases, the common search pattern is to use either the postal code or the locality name in order to localize the search, then use the house number to provide a specific result?

Please let me know if that is incorrect.

Fortunately, Pelias is fairly flexible and so I'll quickly discuss some possible solutions to the issue and their pros/cons, then maybe we can decide on a solution.

Also worth noting that this problem exists outside Europe and is much more common in Asia, but for now let's focus on Europe :)

The first hurdle is ensuring that the data is imported into Pelias, using a query such as the one below I can confirm that these addresses are not even being imported into the system, which we need to do before they can be searchable:

http://pelias.github.io/compare/#/v1/place%3Fids=openstreetmap:address:way:410429784

The reason for this is that these records do not match any of the tag combinations listed in the features whitelist.

If you are running your own Pelias installation (using pelias/docker or any other setup) then you can edit that file before running the openstreetmap importer to select which features get imported.

In particular the first line addr:housenumber+addr:street says that an OSM record must have both the housenumber and street tags, so you could change that to just addr:housenumber.

We wouldn't consider accepting a Pull Request for something like this because it would negatively impact other areas of the world where we would like to enforce that the street must be specified (eg. a query like 99 Berlin doesn't make any sense).

Once the records are in the index you might need to make some changes in order to ensure that queries match. For /v1/autocomplete this might work without too much modification, for /v1/search I suspect it will be more complex.

For /v1/search, we use the libpostal library, which might struggle with streetless inputs, from a quick test I can confirm that this will be an issue:

/parse?address=Kaszowo 29

[
  {
    "label": "road",
    "value": "kaszowo"
  },
  {
    "label": "house_number",
    "value": "29"
  }
]
/parse?address=29 Kaszowo

[
  {
    "label": "house_number",
    "value": "29"
  },
  {
    "label": "city",
    "value": "kaszowo"
  }
]

Looking at the parses above, the result is different depending on the order of inputs, in one case Kaszowo is a street name, in the other case, it's a locality. This is due to the nature of libpostal, it's a machine learning library so it works off a training set of millions of inputs, in this case it doesn't have any idea that this is Poland so it's doing it's best based off the token order.

So this will be an issue, because, depending on the order of inputs, different queries will be generated against Pelias, I didn't check the code but I'm assuming the housenumber+streeet query would generate a normal address matching query and the housenumber+locality one might drop the housenumber and just return the locality, this would need more investigation.

I'm using the the term 'locality' as a blanket term to cover 'towns', 'cities', 'villages' etc.

So for the normal address query, the simplest solution is to just copy the OSM addr:city tag to the addr:street tag, I'm not sure if they would be happy for you to edit this in OSM, but that would be a simple fix that would mosty 'just work', so you can consider asking on the mailing lists if it's ok for you to put the city name in the addr:street field for these addresses.

Otherwise, you can write a little bit of code which performs that action, copying the locality name to the street during import into Pelias.

We wouldn't be able to accept a Pull Request for that either, because it's not correct to do that internationally and would negatively impact results elsewhere, but you could have that on your own installation.

What we could consider supporting is to have a list of these small towns with no street names and their bounding boxes, we could then generate a modified .osm.pbf file containing these 'patched' street names and publish a copy of that file which users can optionally include in their builds.

I hope that gives some background in to the technical challenges with supporting this feature and some potential options you can explore.

missinglink avatar Oct 29 '18 10:10 missinglink

Thanks for your solution! I will check it today and give you feedback. In this case a have another questions. 1*. You have mention only about a OSM importer, but how about a OpenAddress/csv importer? Main reason why i want to use pelias is because of posibility to import own data. So only thing i need is just to duplicate data from city column into street column?

2*. You have also mention about generating modified .osm.pbf data? How would you do that? How can i do this on my own? OSM data from geofabrik already contains all the villages, but not so much house numbers.

arqo123 avatar Oct 29 '18 13:10 arqo123

Hi @arqo123

  1. The openaddresses importer has similar restrictions to the OSM importer, in particular, it probably rejects any row without a valid STREET, it's pretty simple to copy CSV fields using awk or similar on the command-line.

  2. Generating a modified .osm.pbf file is a little more tricky, the simplest way to do this is probably to use osmium-tool to convert a small .pbf file to XML format, make the changes and then convert it back to .pbf.

If you are looking for a docker version of osmium-tool I have one here: https://hub.docker.com/r/missinglink/osmium/

missinglink avatar Oct 29 '18 19:10 missinglink

Ok i've checked it and in didn't work. I've cleard all my shards, change the config file of osm importer and runed import. Number of documents in database is similiar like when i did import with default settings. I've tested it on my voivodeship pbf extract. screenshot_867

Anyway i give it a try. Firstly i've found village that already exist even in WOF data and tried to search any data in a few ways. No results.

*** UPDATE *** Anyway, i've edited a small piece of .osm data extracted from osm, add some points and corectly update my nominatim database with it. Thanks one more time! In my case problem is solved

arqo123 avatar Oct 29 '18 21:10 arqo123

Related to https://github.com/pelias/pelias/issues/641

orangejulius avatar Nov 06 '18 01:11 orangejulius

If you are running your own Pelias installation (using pelias/docker or any other setup) then you can edit that file before running the openstreetmap importer to select which features get imported. I am using docker image. Do I need to edit features.js inside overlay image?

What about adding addr:housenumber+addr:place into tags list? // default tags imported var tags = [ 'addr:housenumber+addr:street', 'addr:housenumber+addr:place' ];

sunblade avatar Jan 21 '20 09:01 sunblade

I think addr:housenumber+addr:place is a valid change for someone who'd like to investigate this further, just be aware that while this setting configures which features are extracted from the PBF file, they don't specify how the data is mapped to the Pelias model.

In order to do that the address_extractor file would also need to be modified to do something like IF( HAS(housenumber) AND NOT(street) AND HAS(place) ) THEN address = CONCAT( housenumber, place ).

I suspect that this change will have a very positive impact for this specific issue :tada: However, I suspect (and I could be wrong), that it will introduce regressions in other parts of the world where data has been incorrectly mapped.

missinglink avatar Jan 21 '20 15:01 missinglink

See this overpass-turbo query which illustrates the false positives for a small region.

missinglink avatar Jan 21 '20 15:01 missinglink

If someone would like to investigate the data further I would love to see an analysis, I just had a look at a few and maybe it's workable 🤷‍♂

Eg. https://www.openstreetmap.org/node/661171687 is not so bad, it's mis-tagged but would work with addr:place instead of addr:street

Also the ones I linked above could be considered valid, I didn't read the corresponding mailing list thread.

missinglink avatar Jan 21 '20 15:01 missinglink

In OSM Wiki they say:

Use addr:place=* instead of addr:street=* for buildings whose number belongs not to a street, but to some other object. It is okay to have both addr:place=* and addr:city=* with the same value on the same object.

So in theory addr:housenumber+addr:place is as correct as addr:housenumber+addr:street, but there will always be unpleasant surprises with OSM... :confused:

Joxit avatar Jan 22 '20 11:01 Joxit

I have added this statement inside tag_mapper:

if(!tags.hasOwnProperty("addr:street")){
  tags['addr:street'] = tags['addr:place']
}

Seems working.

sunblade avatar Jan 22 '20 12:01 sunblade

@orangejulius you've asked me here to provide examples so you can judge the impact.

TLDR: According to openaddresses 37% of all address records in Poland does not have a street name!

Longer version: If we take a look at just Poland here http://results.openaddresses.io/ Files start with pl/

If we download all files and see how many addresses are there:

maciej.andrearczyk$ wc -l *
  483948 dolnoslaskie.csv
  335057 kujawsko-pomorskie.csv
  566467 lodzkie.csv
  570894 lubelskie.csv
  175186 lubuskie.csv
  759099 malopolskie.csv
 1129238 mazowieckie.csv
  217796 opolskie.csv
  509556 podkarpackie.csv
  265634 podlaskie.csv
  383109 pomorskie.csv
  752332 slaskie.csv
  303905 swietokrzyskie.csv
  230354 warminsko-mazurskie.csv
  712417 wielkopolskie.csv
  274643 zachodniopomorskie.csv
 7669635 total

We'll have 7669635 addresses in total.

Now, if we take a look at the number of address records without street name. Street is a fourth column in those csv files:

maciej.andrearczyk$ cat *.csv | cut -d ',' -f 4 | grep -c "^$"
2862376

It will give us 2862376 records. So, 37% of all records does not have a street name!

macieg avatar Apr 19 '20 17:04 macieg

Hi @macieg, thanks for the analysis, it's looks like we're missing a high number of Polish addresses!

It seems that in these situations we can simply copy the locality name to the street field either in the OA CSV file or during the import process and it will fix a lot of the problem.

The thing I'm still not sure about is how we can distinguish these addresses from errors in the data where the street was either incorrectly mapped by the OA machine or the source data didn't have a street (when it should have) for some reason?

Can you think of a way we could distinguish between streetless addresses and incorrectly mapped OSM entries, for instance?

missinglink avatar Apr 20 '20 18:04 missinglink

I've thrown together a simple example for OpenAddresses, can you please try this docker image and let me know?

missinglink avatar Apr 20 '20 18:04 missinglink

Perhaps we can decide what to do when there is a housenumber but not a street on a country by country basis?

For example, I ran @macieg's analysis on all of OpenAddresses for the United States, and there were 9 million records without a street. These are almost certainly invalid, so we wouldn't want to import them.

orangejulius avatar Apr 20 '20 18:04 orangejulius

I had a look at a few and the vast majority seem to be valid small towns with no street names and then you have places like https://www.openstreetmap.org/node/4665583100 which are right next to a bunch of named roads 🤷

missinglink avatar Apr 20 '20 18:04 missinglink

@missinglink , @orangejulius It's pretty common case in Poland, the number 37% doesn't sound very surprising to me. I have no clue how it works in other countries.

It's also not surprising there are villages next to cities with a bunch of named roads.

I can try to find a confirmation from governmental statistic agencies to support what I'm saying, if it helps.

I'll check the docker image tomorrow and let you know if it works :)

macieg avatar Apr 20 '20 20:04 macieg

@missinglink I see no difference when using docker image you've created, but I might doing some wrong steps.

The only thing I did was updating my docker-compose.yml file, so I have now

  openaddresses:
    image: pelias/openaddresses:polish_addresses
    container_name: pelias_openaddresses_pl
    user: "${DOCKER_USER}"
    volumes:
      - "./pelias.json:/code/pelias.json"
      - "${DATA_DIR}:/data"

I've set up everything again and I still have the same number of records missing:

020-04-21T16:42:22.848Z - verbose: [openaddresses] number of invalid records skipped: 386138

I've checked the status of images, and it looks good to me:

polpc05554:poland maciej.andrearczyk$ ../../pelias compose ps
         Name                        Command                 State                          Ports
----------------------------------------------------------------------------------------------------------------------
pelias_api                ./bin/start                      Up           0.0.0.0:4000->4000/tcp
pelias_csv_importer       /bin/bash                        Exit 0
pelias_elasticsearch      /usr/local/bin/docker-entr ...   Up           0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp
pelias_interpolation      ./interpolate server /data ...   Restarting
pelias_libpostal          ./bin/wof-libpostal-server ...   Restarting
pelias_openaddresses_pl   /bin/bash                        Exit 0
pelias_pip-service        ./bin/start                      Up           0.0.0.0:4200->4200/tcp
pelias_placeholder        ./cmd/server.sh                  Up           0.0.0.0:4100->4100/tcp
pelias_schema             /bin/bash                        Exit 0
pelias_whosonfirst        /bin/bash                        Exit 0

What am I missing?

macieg avatar Apr 21 '20 16:04 macieg

Some progress has been made towards this issue in https://github.com/pelias/openstreetmap/pull/565.

The good news is after we merge that PR we will be importing these addresses into elasticsearch, the bad news is they still don't seem to be retrievable with our existing query logic.

Some additional work will still be required, which I mentioned in more detail in https://github.com/pelias/openstreetmap/pull/565#issuecomment-1070867061

missinglink avatar Mar 17 '22 12:03 missinglink