pelias
pelias copied to clipboard
Diffenrent results for pelias.github.io/compare and self deployed instance
I recognized some difference in running queries on pelias.github.io
and our self deployed instance.
As an example (autocomplete):
-
pelias.github.io instance
Saturnstraße Willich returnsSaturnstraße, Willich, Deutschland
as a result -
self deployed instance
Saturnstraße Willich returns 0 results:
{
"geocoding": {
"version": "0.2",
"attribution": "http://127.0.0.1:8888/attribution",
"query": {
"text": "Saturnstraße Willich",
"parser": "pelias",
"parsed_text": {
"subject": "Saturnstraße",
"street": "Saturnstraße",
"locality": "Willich",
"admin": "Willich"
},
"size": 10,
"layers": [
"venue",
"street",
"country",
"macroregion",
"region",
"county",
"localadmin",
"locality",
"borough",
"neighbourhood",
"continent",
"empire",
"dependency",
"macrocounty",
"macrohood",
"microhood",
"disputed",
"postalcode",
"ocean",
"marinearea"
],
"private": false,
"lang": {
"name": "German",
"iso6391": "de",
"iso6393": "deu",
"via": "querystring",
"defaulted": false
},
"querySize": 20
},
"warnings": [
"performance optimization: excluding 'address' layer",
"Invalid Parameter: api_key"
],
"engine": {
"name": "Pelias",
"author": "Mapzen",
"version": "1.0"
},
"timestamp": 1606398993468
},
"type": "FeatureCollection",
"features": []
}
The parsed text seems identical to the parsed text on pelias.github.io
.
The pelias.json is the same as in the pelias\docker
repo in projects\germany
folder.
The data in our self deployed instance was last imported on 24.11.2020.
Is there a way to further diagnose why some addresses are missing?
Further the address seems to be imported correctly to the Elasticsearch cluster:
Debug level 3 output:
{
"geocoding": {
"version": "0.2",
"attribution": "http://127.0.0.1:8888/attribution",
"query": {
"enableDebug": true,
"exposeInternalDebugTools": true,
"enableElasticDebug": true,
"enableElasticExplain": true,
"text": "Saturnstraße Willich",
"parser": "pelias",
"parsed_text": {
"subject": "Saturnstraße",
"street": "Saturnstraße",
"locality": "Willich",
"admin": "Willich"
},
"size": 10,
"layers": [
"venue",
"street",
"country",
"macroregion",
"region",
"county",
"localadmin",
"locality",
"borough",
"neighbourhood",
"continent",
"empire",
"dependency",
"macrocounty",
"macrohood",
"microhood",
"disputed",
"postalcode",
"ocean",
"marinearea"
],
"private": false,
"lang": {
"name": "German",
"iso6391": "de",
"iso6393": "deu",
"via": "querystring",
"defaulted": false
},
"querySize": 20
},
"warnings": [
"performance optimization: excluding 'address' layer",
"Invalid Parameter: api_key"
],
"debug": [
{
"controller:predicates:has_response_data": {
"reply": false,
"stack_trace": "at controller (/home/pelias/controller/search.js:16:10)"
}
},
{
"controller:predicates:has_request_errors": {
"reply": false,
"stack_trace": "at controller (/home/pelias/controller/search.js:16:10)"
}
},
{
"controller:search": {
"debugUrl": "http://elasticsearch-master.elasticsearch.svc.k8s-cluster.company.xyz:9200/pelias/_search?source_content_type=application%2Fjson&source=%7B%22query%22%3A%7B%22bool%22%3A%7B%22must%22%3A%5B%7B%22constant_score%22%3A%7B%22filter%22%3A%7B%22multi_match%22%3A%7B%22type%22%3A%22phrase%22%2C%22query%22%3A%22Saturnstra%C3%9Fe%22%2C%22fields%22%3A%5B%22name.default%22%2C%22name.de%22%5D%2C%22analyzer%22%3A%22peliasQuery%22%2C%22boost%22%3A100%2C%22slop%22%3A3%7D%7D%7D%7D%2C%7B%22multi_match%22%3A%7B%22type%22%3A%22cross_fields%22%2C%22query%22%3A%22Willich%22%2C%22fields%22%3A%5B%22parent.country.ngram%5E1%22%2C%22parent.dependency.ngram%5E1%22%2C%22parent.macroregion.ngram%5E1%22%2C%22parent.region.ngram%5E1%22%2C%22parent.county.ngram%5E1%22%2C%22parent.localadmin.ngram%5E1%22%2C%22parent.locality.ngram%5E1%22%2C%22parent.borough.ngram%5E1%22%2C%22parent.neighbourhood.ngram%5E1%22%2C%22parent.locality_a.ngram%5E1%22%2C%22parent.region_a.ngram%5E1%22%2C%22parent.country_a.ngram%5E1%22%2C%22name.default%5E1.5%22%2C%22name.de%5E1.5%22%5D%2C%22analyzer%22%3A%22peliasAdmin%22%7D%7D%5D%2C%22should%22%3A%5B%7B%22function_score%22%3A%7B%22query%22%3A%7B%22match_all%22%3A%7B%7D%7D%2C%22max_boost%22%3A20%2C%22functions%22%3A%5B%7B%22field_value_factor%22%3A%7B%22modifier%22%3A%22log1p%22%2C%22field%22%3A%22popularity%22%2C%22missing%22%3A1%7D%2C%22weight%22%3A1%7D%5D%2C%22score_mode%22%3A%22first%22%2C%22boost_mode%22%3A%22replace%22%7D%7D%2C%7B%22function_score%22%3A%7B%22query%22%3A%7B%22match_all%22%3A%7B%7D%7D%2C%22max_boost%22%3A20%2C%22functions%22%3A%5B%7B%22field_value_factor%22%3A%7B%22modifier%22%3A%22log1p%22%2C%22field%22%3A%22population%22%2C%22missing%22%3A1%7D%2C%22weight%22%3A3%7D%5D%2C%22score_mode%22%3A%22first%22%2C%22boost_mode%22%3A%22replace%22%7D%7D%5D%2C%22filter%22%3A%5B%7B%22terms%22%3A%7B%22layer%22%3A%5B%22venue%22%2C%22street%22%2C%22country%22%2C%22macroregion%22%2C%22region%22%2C%22county%22%2C%22localadmin%22%2C%22locality%22%2C%22borough%22%2C%22neighbourhood%22%2C%22continent%22%2C%22empire%22%2C%22dependency%22%2C%22macrocounty%22%2C%22macrohood%22%2C%22microhood%22%2C%22disputed%22%2C%22postalcode%22%2C%22ocean%22%2C%22marinearea%22%5D%7D%7D%5D%7D%7D%2C%22size%22%3A20%2C%22track_scores%22%3Atrue%2C%22sort%22%3A%5B%22_score%22%5D%7D",
"ES_req": {
"index": "pelias",
"searchType": "dfs_query_then_fetch",
"body": {
"query": {
"bool": {
"must": [
{
"constant_score": {
"filter": {
"multi_match": {
"type": "phrase",
"query": "Saturnstraße",
"fields": [
"name.default",
"name.de"
],
"analyzer": "peliasQuery",
"boost": 100,
"slop": 3
}
}
}
},
{
"multi_match": {
"type": "cross_fields",
"query": "Willich",
"fields": [
"parent.country.ngram^1",
"parent.dependency.ngram^1",
"parent.macroregion.ngram^1",
"parent.region.ngram^1",
"parent.county.ngram^1",
"parent.localadmin.ngram^1",
"parent.locality.ngram^1",
"parent.borough.ngram^1",
"parent.neighbourhood.ngram^1",
"parent.locality_a.ngram^1",
"parent.region_a.ngram^1",
"parent.country_a.ngram^1",
"name.default^1.5",
"name.de^1.5"
],
"analyzer": "peliasAdmin"
}
}
],
"should": [
{
"function_score": {
"query": {
"match_all": {}
},
"max_boost": 20,
"functions": [
{
"field_value_factor": {
"modifier": "log1p",
"field": "popularity",
"missing": 1
},
"weight": 1
}
],
"score_mode": "first",
"boost_mode": "replace"
}
},
{
"function_score": {
"query": {
"match_all": {}
},
"max_boost": 20,
"functions": [
{
"field_value_factor": {
"modifier": "log1p",
"field": "population",
"missing": 1
},
"weight": 3
}
],
"score_mode": "first",
"boost_mode": "replace"
}
}
],
"filter": [
{
"terms": {
"layer": [
"venue",
"street",
"country",
"macroregion",
"region",
"county",
"localadmin",
"locality",
"borough",
"neighbourhood",
"continent",
"empire",
"dependency",
"macrocounty",
"macrohood",
"microhood",
"disputed",
"postalcode",
"ocean",
"marinearea"
]
}
}
]
}
},
"size": 20,
"track_scores": true,
"sort": [
"_score"
]
},
"explain": true
}
}
},
{
"controller:search": "Timer Began. Attempt 1"
},
{
"controller:search": "Timer Stopped. 0 ms"
},
{
"controller:search": {
"queryType": {
"autocomplete": {
"es_took": 4,
"response_time": 7,
"retries": 0,
"es_hits": 0,
"es_result_count": 0
}
}
}
},
{
"controller:search": {
"ES_response": {
"docs": [],
"meta": {
"scores": []
},
"data": {
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"response_time": 7
}
}
}
},
{
"controller:predicates:has_response_data": {
"reply": false,
"stack_trace": "at controller (/home/pelias/middleware/changeLanguage.js:32:10)"
}
}
],
"engine": {
"name": "Pelias",
"author": "Mapzen",
"version": "1.0"
},
"timestamp": 1606400384932
},
"type": "FeatureCollection",
"features": []
}
To reproduce run pelias with the configuration of the project projects\germany
in the pelias\docker
repo.
@orangejulius @missinglink
Ok, I found the issue. By passing layers=coarse,address,venue,neighbourhood,locality,localadmin,county,macrocounty,region,borough,country
as query parameter the results are the same between pelias.github.io/compare and a self deployed instance.
So the question is, why does api.geocode.earth include those layers by default?
Also, when performing an autocomplete search with Marsweg 47877
as query text the parsed text is correct
"parsed_text": {
"subject": "Marsweg",
"street": "Marsweg",
"postcode": "47877"
}
however, the postcode
is not applied to the search results hence resulting in a number of different results not including the actual searched locality:
0) Wohnstätte Marsweg, Fürth, BY, Deutschland
1) Marsweg 10, Ahlen, NW, Deutschland
2) Marsweg 9, Essen, NW, Deutschland
3) Marsweg 10, Essen, NW, Deutschland
4) Marsweg 6, Ahlen, NW, Deutschland
5) Marsweg 8, Ahlen, NW, Deutschland
6) Marsweg 12, Ahlen, NW, Deutschland
7) Marsweg 14, Ahlen, NW, Deutschland
8) Marsweg 16, Ahlen, NW, Deutschland
9) Marsweg 18, Ahlen, NW, Deutschland
```
Hi @msschl we're a small team and can't help debug custom installations, it's takes a lot of our time away from developing and maintaining the software.
Some tips:
- Make sure everything is up-to-date
- If you're using the
pelias/docker
repo to build and run your installation, try thepelias elastic stats
command to see which layers you have imported and their relative document counts. - If you're using non-standard layers then you should ensure targets.auto_discover is enabled (it's enabled by default).
Regarding your other question, streets don't have postcodes in Pelias.
Even in Germany streets commonly cross Kiez/Bezirk/Stadt boundaries and so would potentially have multiple postcodes for a single street, these are quite difficult to compute, IIRC nominatim associates a single postcode to streets at the midpoint along the linestring, but they use a PostGIS server with the whole of OSM loaded to accomplish this, and it's still error-prone.
For that reason we consider postcode as only a property of an address, and so a query clause is only generated for address queries (ie. ones which include both a street and a house number)
[edit] maybe that's not entirely correct, the postcode portion of the parse for fully-qualified address queries doesn't seem to be applied, can you please open a separate issue to discuss this?
Thanks @missinglink
- We are using the latest builds
- The results of
pelias elastic stats
seem reasonable:
{
"took" : 3359,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"sources" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "openstreetmap",
"doc_count" : 19168326,
"layers" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "address",
"doc_count" : 16250856
},
{
"key" : "venue",
"doc_count" : 2917470
}
]
}
},
{
"key" : "openaddresses",
"doc_count" : 6951114,
"layers" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "address",
"doc_count" : 6951114
}
]
}
},
{
"key" : "whosonfirst",
"doc_count" : 84900,
"layers" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "neighbourhood",
"doc_count" : 67384
},
{
"key" : "locality",
"doc_count" : 12423
},
{
"key" : "localadmin",
"doc_count" : 4646
},
{
"key" : "county",
"doc_count" : 399
},
{
"key" : "macrocounty",
"doc_count" : 19
},
{
"key" : "region",
"doc_count" : 16
},
{
"key" : "borough",
"doc_count" : 12
},
{
"key" : "country",
"doc_count" : 1
}
]
}
}
]
}
}
}
- We are using the standard layers
So I don't know why the layers are not automattically included in the autocomplete query. However, I'm fine with passing layers=coarse,address,venue,neighbourhood,locality,localadmin,county,macrocounty,region,borough,country
as query parameter to get all the results.
Regarding the postcode I'll open a new issue to discuss this.
You don't have anything on the street
layer, this is usually provided by https://github.com/pelias/polylines
Won't running pelias prepare all
prepare also the polylines from the osm data set?
Oh ok, after checking the docker_extract.sh
script in pelias/polylines I recognized that the script exits if the pbf
file is greater than 1GB. This however is not obvious from the message. The message should probably by extended by a paragraph stating that the polyline extract process is about to exit and thus no polylines will be extracted.
https://github.com/pelias/polylines/pull/248 https://github.com/pelias/polylines/pull/248#issuecomment-601138283 https://github.com/pelias/docker/issues/198
I've gone ahead and updated the warning message in the pelias/polylines#259 repo.
I also figured out that in the pelias.json
config file under imports.polyline.files
an array of files is expected, however, in the bin/cli.js
file only the first file set of the array will be used while importing. See bin/cli.js#L39.
I would suggest either deprecating imports.polyline.files
for imports.polyline.file
and expect a string, or changing the bin/cli.js
to support importing multiple polyline files.
@missinglink What do you think?
I'm reluctant to change config variables, just because you can deprecate them but people end up using the old version for years afterwards.
I opened this recently which might help: https://github.com/pelias/interpolation/pull/269
That looks good. Let's add at least a warning if more than one file is specified in the pelias.json
config file for now.
Maybe this warning could be removed in the future and support for importing multiple polyline files can be introduced.