pelias icon indicating copy to clipboard operation
pelias copied to clipboard

Diffenrent results for pelias.github.io/compare and self deployed instance

Open msschl opened this issue 4 years ago • 12 comments

I recognized some difference in running queries on pelias.github.io and our self deployed instance.

As an example (autocomplete):

  • pelias.github.io instance Saturnstraße Willich returns Saturnstraße, Willich, Deutschland as a result
  • self deployed instance Saturnstraße Willich returns 0 results:
{
    "geocoding": {
        "version": "0.2",
        "attribution": "http://127.0.0.1:8888/attribution",
        "query": {
            "text": "Saturnstraße Willich",
            "parser": "pelias",
            "parsed_text": {
                "subject": "Saturnstraße",
                "street": "Saturnstraße",
                "locality": "Willich",
                "admin": "Willich"
            },
            "size": 10,
            "layers": [
                "venue",
                "street",
                "country",
                "macroregion",
                "region",
                "county",
                "localadmin",
                "locality",
                "borough",
                "neighbourhood",
                "continent",
                "empire",
                "dependency",
                "macrocounty",
                "macrohood",
                "microhood",
                "disputed",
                "postalcode",
                "ocean",
                "marinearea"
            ],
            "private": false,
            "lang": {
                "name": "German",
                "iso6391": "de",
                "iso6393": "deu",
                "via": "querystring",
                "defaulted": false
            },
            "querySize": 20
        },
        "warnings": [
            "performance optimization: excluding 'address' layer",
            "Invalid Parameter: api_key"
        ],
        "engine": {
            "name": "Pelias",
            "author": "Mapzen",
            "version": "1.0"
        },
        "timestamp": 1606398993468
    },
    "type": "FeatureCollection",
    "features": []
}

The parsed text seems identical to the parsed text on pelias.github.io. The pelias.json is the same as in the pelias\docker repo in projects\germany folder. The data in our self deployed instance was last imported on 24.11.2020.

Is there a way to further diagnose why some addresses are missing? Further the address seems to be imported correctly to the Elasticsearch cluster: image

msschl avatar Nov 26 '20 14:11 msschl

Debug level 3 output:

{
    "geocoding": {
        "version": "0.2",
        "attribution": "http://127.0.0.1:8888/attribution",
        "query": {
            "enableDebug": true,
            "exposeInternalDebugTools": true,
            "enableElasticDebug": true,
            "enableElasticExplain": true,
            "text": "Saturnstraße Willich",
            "parser": "pelias",
            "parsed_text": {
                "subject": "Saturnstraße",
                "street": "Saturnstraße",
                "locality": "Willich",
                "admin": "Willich"
            },
            "size": 10,
            "layers": [
                "venue",
                "street",
                "country",
                "macroregion",
                "region",
                "county",
                "localadmin",
                "locality",
                "borough",
                "neighbourhood",
                "continent",
                "empire",
                "dependency",
                "macrocounty",
                "macrohood",
                "microhood",
                "disputed",
                "postalcode",
                "ocean",
                "marinearea"
            ],
            "private": false,
            "lang": {
                "name": "German",
                "iso6391": "de",
                "iso6393": "deu",
                "via": "querystring",
                "defaulted": false
            },
            "querySize": 20
        },
        "warnings": [
            "performance optimization: excluding 'address' layer",
            "Invalid Parameter: api_key"
        ],
        "debug": [
            {
                "controller:predicates:has_response_data": {
                    "reply": false,
                    "stack_trace": "at controller (/home/pelias/controller/search.js:16:10)"
                }
            },
            {
                "controller:predicates:has_request_errors": {
                    "reply": false,
                    "stack_trace": "at controller (/home/pelias/controller/search.js:16:10)"
                }
            },
            {
                "controller:search": {
                    "debugUrl": "http://elasticsearch-master.elasticsearch.svc.k8s-cluster.company.xyz:9200/pelias/_search?source_content_type=application%2Fjson&source=%7B%22query%22%3A%7B%22bool%22%3A%7B%22must%22%3A%5B%7B%22constant_score%22%3A%7B%22filter%22%3A%7B%22multi_match%22%3A%7B%22type%22%3A%22phrase%22%2C%22query%22%3A%22Saturnstra%C3%9Fe%22%2C%22fields%22%3A%5B%22name.default%22%2C%22name.de%22%5D%2C%22analyzer%22%3A%22peliasQuery%22%2C%22boost%22%3A100%2C%22slop%22%3A3%7D%7D%7D%7D%2C%7B%22multi_match%22%3A%7B%22type%22%3A%22cross_fields%22%2C%22query%22%3A%22Willich%22%2C%22fields%22%3A%5B%22parent.country.ngram%5E1%22%2C%22parent.dependency.ngram%5E1%22%2C%22parent.macroregion.ngram%5E1%22%2C%22parent.region.ngram%5E1%22%2C%22parent.county.ngram%5E1%22%2C%22parent.localadmin.ngram%5E1%22%2C%22parent.locality.ngram%5E1%22%2C%22parent.borough.ngram%5E1%22%2C%22parent.neighbourhood.ngram%5E1%22%2C%22parent.locality_a.ngram%5E1%22%2C%22parent.region_a.ngram%5E1%22%2C%22parent.country_a.ngram%5E1%22%2C%22name.default%5E1.5%22%2C%22name.de%5E1.5%22%5D%2C%22analyzer%22%3A%22peliasAdmin%22%7D%7D%5D%2C%22should%22%3A%5B%7B%22function_score%22%3A%7B%22query%22%3A%7B%22match_all%22%3A%7B%7D%7D%2C%22max_boost%22%3A20%2C%22functions%22%3A%5B%7B%22field_value_factor%22%3A%7B%22modifier%22%3A%22log1p%22%2C%22field%22%3A%22popularity%22%2C%22missing%22%3A1%7D%2C%22weight%22%3A1%7D%5D%2C%22score_mode%22%3A%22first%22%2C%22boost_mode%22%3A%22replace%22%7D%7D%2C%7B%22function_score%22%3A%7B%22query%22%3A%7B%22match_all%22%3A%7B%7D%7D%2C%22max_boost%22%3A20%2C%22functions%22%3A%5B%7B%22field_value_factor%22%3A%7B%22modifier%22%3A%22log1p%22%2C%22field%22%3A%22population%22%2C%22missing%22%3A1%7D%2C%22weight%22%3A3%7D%5D%2C%22score_mode%22%3A%22first%22%2C%22boost_mode%22%3A%22replace%22%7D%7D%5D%2C%22filter%22%3A%5B%7B%22terms%22%3A%7B%22layer%22%3A%5B%22venue%22%2C%22street%22%2C%22country%22%2C%22macroregion%22%2C%22region%22%2C%22county%22%2C%22localadmin%22%2C%22locality%22%2C%22borough%22%2C%22neighbourhood%22%2C%22continent%22%2C%22empire%22%2C%22dependency%22%2C%22macrocounty%22%2C%22macrohood%22%2C%22microhood%22%2C%22disputed%22%2C%22postalcode%22%2C%22ocean%22%2C%22marinearea%22%5D%7D%7D%5D%7D%7D%2C%22size%22%3A20%2C%22track_scores%22%3Atrue%2C%22sort%22%3A%5B%22_score%22%5D%7D",
                    "ES_req": {
                        "index": "pelias",
                        "searchType": "dfs_query_then_fetch",
                        "body": {
                            "query": {
                                "bool": {
                                    "must": [
                                        {
                                            "constant_score": {
                                                "filter": {
                                                    "multi_match": {
                                                        "type": "phrase",
                                                        "query": "Saturnstraße",
                                                        "fields": [
                                                            "name.default",
                                                            "name.de"
                                                        ],
                                                        "analyzer": "peliasQuery",
                                                        "boost": 100,
                                                        "slop": 3
                                                    }
                                                }
                                            }
                                        },
                                        {
                                            "multi_match": {
                                                "type": "cross_fields",
                                                "query": "Willich",
                                                "fields": [
                                                    "parent.country.ngram^1",
                                                    "parent.dependency.ngram^1",
                                                    "parent.macroregion.ngram^1",
                                                    "parent.region.ngram^1",
                                                    "parent.county.ngram^1",
                                                    "parent.localadmin.ngram^1",
                                                    "parent.locality.ngram^1",
                                                    "parent.borough.ngram^1",
                                                    "parent.neighbourhood.ngram^1",
                                                    "parent.locality_a.ngram^1",
                                                    "parent.region_a.ngram^1",
                                                    "parent.country_a.ngram^1",
                                                    "name.default^1.5",
                                                    "name.de^1.5"
                                                ],
                                                "analyzer": "peliasAdmin"
                                            }
                                        }
                                    ],
                                    "should": [
                                        {
                                            "function_score": {
                                                "query": {
                                                    "match_all": {}
                                                },
                                                "max_boost": 20,
                                                "functions": [
                                                    {
                                                        "field_value_factor": {
                                                            "modifier": "log1p",
                                                            "field": "popularity",
                                                            "missing": 1
                                                        },
                                                        "weight": 1
                                                    }
                                                ],
                                                "score_mode": "first",
                                                "boost_mode": "replace"
                                            }
                                        },
                                        {
                                            "function_score": {
                                                "query": {
                                                    "match_all": {}
                                                },
                                                "max_boost": 20,
                                                "functions": [
                                                    {
                                                        "field_value_factor": {
                                                            "modifier": "log1p",
                                                            "field": "population",
                                                            "missing": 1
                                                        },
                                                        "weight": 3
                                                    }
                                                ],
                                                "score_mode": "first",
                                                "boost_mode": "replace"
                                            }
                                        }
                                    ],
                                    "filter": [
                                        {
                                            "terms": {
                                                "layer": [
                                                    "venue",
                                                    "street",
                                                    "country",
                                                    "macroregion",
                                                    "region",
                                                    "county",
                                                    "localadmin",
                                                    "locality",
                                                    "borough",
                                                    "neighbourhood",
                                                    "continent",
                                                    "empire",
                                                    "dependency",
                                                    "macrocounty",
                                                    "macrohood",
                                                    "microhood",
                                                    "disputed",
                                                    "postalcode",
                                                    "ocean",
                                                    "marinearea"
                                                ]
                                            }
                                        }
                                    ]
                                }
                            },
                            "size": 20,
                            "track_scores": true,
                            "sort": [
                                "_score"
                            ]
                        },
                        "explain": true
                    }
                }
            },
            {
                "controller:search": "Timer Began. Attempt 1"
            },
            {
                "controller:search": "Timer Stopped. 0 ms"
            },
            {
                "controller:search": {
                    "queryType": {
                        "autocomplete": {
                            "es_took": 4,
                            "response_time": 7,
                            "retries": 0,
                            "es_hits": 0,
                            "es_result_count": 0
                        }
                    }
                }
            },
            {
                "controller:search": {
                    "ES_response": {
                        "docs": [],
                        "meta": {
                            "scores": []
                        },
                        "data": {
                            "took": 4,
                            "timed_out": false,
                            "_shards": {
                                "total": 1,
                                "successful": 1,
                                "skipped": 0,
                                "failed": 0
                            },
                            "hits": {
                                "total": {
                                    "value": 0,
                                    "relation": "eq"
                                },
                                "max_score": null,
                                "hits": []
                            },
                            "response_time": 7
                        }
                    }
                }
            },
            {
                "controller:predicates:has_response_data": {
                    "reply": false,
                    "stack_trace": "at controller (/home/pelias/middleware/changeLanguage.js:32:10)"
                }
            }
        ],
        "engine": {
            "name": "Pelias",
            "author": "Mapzen",
            "version": "1.0"
        },
        "timestamp": 1606400384932
    },
    "type": "FeatureCollection",
    "features": []
}

msschl avatar Nov 26 '20 14:11 msschl

To reproduce run pelias with the configuration of the project projects\germany in the pelias\docker repo.

msschl avatar Dec 10 '20 15:12 msschl

@orangejulius @missinglink

Ok, I found the issue. By passing layers=coarse,address,venue,neighbourhood,locality,localadmin,county,macrocounty,region,borough,country as query parameter the results are the same between pelias.github.io/compare and a self deployed instance.

So the question is, why does api.geocode.earth include those layers by default?

Also, when performing an autocomplete search with Marsweg 47877 as query text the parsed text is correct

"parsed_text": {
  "subject": "Marsweg",
  "street": "Marsweg",
  "postcode": "47877"
}

however, the postcode is not applied to the search results hence resulting in a number of different results not including the actual searched locality:

0) Wohnstätte Marsweg, Fürth, BY, Deutschland
1) Marsweg 10, Ahlen, NW, Deutschland
2) Marsweg 9, Essen, NW, Deutschland
3) Marsweg 10, Essen, NW, Deutschland
4) Marsweg 6, Ahlen, NW, Deutschland
5) Marsweg 8, Ahlen, NW, Deutschland
6) Marsweg 12, Ahlen, NW, Deutschland
7) Marsweg 14, Ahlen, NW, Deutschland
8) Marsweg 16, Ahlen, NW, Deutschland
9) Marsweg 18, Ahlen, NW, Deutschland
```

msschl avatar Aug 13 '21 10:08 msschl

Hi @msschl we're a small team and can't help debug custom installations, it's takes a lot of our time away from developing and maintaining the software.

Some tips:

  • Make sure everything is up-to-date
  • If you're using the pelias/docker repo to build and run your installation, try the pelias elastic stats command to see which layers you have imported and their relative document counts.
  • If you're using non-standard layers then you should ensure targets.auto_discover is enabled (it's enabled by default).

Regarding your other question, streets don't have postcodes in Pelias.

Even in Germany streets commonly cross Kiez/Bezirk/Stadt boundaries and so would potentially have multiple postcodes for a single street, these are quite difficult to compute, IIRC nominatim associates a single postcode to streets at the midpoint along the linestring, but they use a PostGIS server with the whole of OSM loaded to accomplish this, and it's still error-prone.

For that reason we consider postcode as only a property of an address, and so a query clause is only generated for address queries (ie. ones which include both a street and a house number)

missinglink avatar Aug 13 '21 11:08 missinglink

[edit] maybe that's not entirely correct, the postcode portion of the parse for fully-qualified address queries doesn't seem to be applied, can you please open a separate issue to discuss this?

missinglink avatar Aug 13 '21 11:08 missinglink

Thanks @missinglink

  • We are using the latest builds
  • The results of pelias elastic stats seem reasonable:
{
  "took" : 3359,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "sources" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "openstreetmap",
          "doc_count" : 19168326,
          "layers" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "address",
                "doc_count" : 16250856
              },
              {
                "key" : "venue",
                "doc_count" : 2917470
              }
            ]
          }
        },
        {
          "key" : "openaddresses",
          "doc_count" : 6951114,
          "layers" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "address",
                "doc_count" : 6951114
              }
            ]
          }
        },
        {
          "key" : "whosonfirst",
          "doc_count" : 84900,
          "layers" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "neighbourhood",
                "doc_count" : 67384
              },
              {
                "key" : "locality",
                "doc_count" : 12423
              },
              {
                "key" : "localadmin",
                "doc_count" : 4646
              },
              {
                "key" : "county",
                "doc_count" : 399
              },
              {
                "key" : "macrocounty",
                "doc_count" : 19
              },
              {
                "key" : "region",
                "doc_count" : 16
              },
              {
                "key" : "borough",
                "doc_count" : 12
              },
              {
                "key" : "country",
                "doc_count" : 1
              }
            ]
          }
        }
      ]
    }
  }
}
  • We are using the standard layers

So I don't know why the layers are not automattically included in the autocomplete query. However, I'm fine with passing layers=coarse,address,venue,neighbourhood,locality,localadmin,county,macrocounty,region,borough,country as query parameter to get all the results.

Regarding the postcode I'll open a new issue to discuss this.

msschl avatar Aug 13 '21 12:08 msschl

You don't have anything on the street layer, this is usually provided by https://github.com/pelias/polylines

missinglink avatar Aug 13 '21 14:08 missinglink

Won't running pelias prepare all prepare also the polylines from the osm data set?

msschl avatar Aug 13 '21 15:08 msschl

Oh ok, after checking the docker_extract.sh script in pelias/polylines I recognized that the script exits if the pbf file is greater than 1GB. This however is not obvious from the message. The message should probably by extended by a paragraph stating that the polyline extract process is about to exit and thus no polylines will be extracted.

https://github.com/pelias/polylines/pull/248 https://github.com/pelias/polylines/pull/248#issuecomment-601138283 https://github.com/pelias/docker/issues/198

msschl avatar Aug 13 '21 15:08 msschl

I've gone ahead and updated the warning message in the pelias/polylines#259 repo.

I also figured out that in the pelias.json config file under imports.polyline.files an array of files is expected, however, in the bin/cli.js file only the first file set of the array will be used while importing. See bin/cli.js#L39. I would suggest either deprecating imports.polyline.files for imports.polyline.file and expect a string, or changing the bin/cli.js to support importing multiple polyline files.

@missinglink What do you think?

msschl avatar Aug 17 '21 14:08 msschl

I'm reluctant to change config variables, just because you can deprecate them but people end up using the old version for years afterwards.

I opened this recently which might help: https://github.com/pelias/interpolation/pull/269

missinglink avatar Aug 17 '21 15:08 missinglink

That looks good. Let's add at least a warning if more than one file is specified in the pelias.json config file for now.

Maybe this warning could be removed in the future and support for importing multiple polyline files can be introduced.

msschl avatar Aug 17 '21 15:08 msschl