luqum
luqum copied to clipboard
Keyword fields containing wildcards cannot be searched for exactly
Thank you very much, you have created a really amazing library. 👍🏻
I have come across a special case. I have keyword fields that contain wildcard characters (* or ?). In Elasticsearch this is no problem at all. But it seems luqum has some difficulties with this use case.
Here is an example of indexing a document with a keyword field containing wildcard characters using ES.
from elasticsearch import Elasticsearch
es = Elasticsearch(hosts="http://localhost:9200")
mappings = {"properties":{"vendor":{"type":"keyword"}}}
es.indices.create(index="test", mappings=mappings)
es.index(index="test", body={"vendor": "f**k"}, id="example")
Now I want to search for the field. The following works, but is not what I want, because it does a wildcard search and not an exact term search.
es.search(body={
"query": {
"query_string": {
"query": "vendor:f**k"
}
}
}, index="test")
{'took': 2,
'timed_out': False,
'_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
'hits': {'total': {'value': 1, 'relation': 'eq'},
'max_score': 1.0,
'hits': [{'_index': 'test',
'_id': 'example',
'_score': 1.0,
'_source': {'vendor': 'f**k'}}]}}
(1) To search exact you have to escape the wildcard characters. This works in ES.
es.search(body={
"query": {
"query_string": {
"query": "vendor:f\*\*k"
}
}
}, index="test")
{'took': 1,
'timed_out': False,
'_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
'hits': {'total': {'value': 1, 'relation': 'eq'},
'max_score': 0.2876821,
'hits': [{'_index': 'test',
'_id': 'example',
'_score': 0.2876821,
'_source': {'vendor': 'f**k'}}]}}
(2) Alternatively you can also use a phrase query. This works in ES.
es.search(body={
"query": {
"query_string": {
"query": 'vendor:"f\*\*k"'
}
}
}, index="test")
{'took': 1,
'timed_out': False,
'_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
'hits': {'total': {'value': 1, 'relation': 'eq'},
'max_score': 0.2876821,
'hits': [{'_index': 'test',
'_id': 'example',
'_score': 0.2876821,
'_source': {'vendor': 'f**k'}}]}}
Now when I try both (1) and (2) with luqum, it doesn't seem to work.
from luqum.elasticsearch import SchemaAnalyzer, ElasticsearchQueryBuilder
schema_analizer = SchemaAnalyzer({"mappings": mappings})
es_builder = ElasticsearchQueryBuilder(**schema_analizer.query_builder_options())
(1) Luqum creates a wildcard query when the "*" characters are escaped. This behaviour is different from ES and not what I expected. Apparently the escape characters are not removed either.
from luqum.parser import parser
es_builder(parser.parse("vendor:f\*\*k"))
{'wildcard': {'vendor': {'value': 'f\\*\\*k'}}}
(2) Luqum creates a wildcard query when the search term is entered as a phrase. This behaviour is also different from ES and not what I expected.
from luqum.parser import parser
es_builder(parser.parse('vendor:"f**k"'))
{'wildcard': {'vendor': {'value': 'f**k'}}}
Somehow I don't see any possibilities to formulate a query string in such a way that a term with "*" can be searched for exactly.
Regards, André