elasticsearch-dsl-py icon indicating copy to clipboard operation
elasticsearch-dsl-py copied to clipboard

Issue querying

Open zeitler opened this issue 3 years ago • 4 comments

Hi I've googling and I'm having a lot of problem getting how to query properly.

Having:

*********** models.py ******************* class Account(models.Model): name = models.CharField(max_length=128, db_index=True) description = models.TextField(blank=True, null=True) email = models.EmailField(max_length=254, db_index=True) ....

********* analyzers.py ********************* def ngram_filter(min=2, max=15): return token_filter( f"ngram_{min}_{max}_filter", type="ngram", min_gram=min, max_gram=max )

def edge_gram_filter(min=2, max=15): return token_filter( f"edgegram_{min}_{max}_filter", type="edge_ngram", min_gram=min, max_gram=max )

def stop_words_filter(): return [ english_stop_words_filter, portuguese_stop_words_filter, ]

filters = stop_words_filter() filters.append(ngram_filter(3, 3)) filters.append(edge_gram_filter(2, 15)) full_searchable_analyzer = analyzer( "full_searchable_analyzer", tokenizer="keyword", filter=filters ) full_searchable_analyzer = analyzer( "full_searchable_analyzer", tokenizer="keyword", filter=filters ) string_sort_analyzer = analyzer( 'string_sort', type="keyword", filter=[ "lowercase", ] )

************** documents.py ******************** @registry.register_document class AccountDocument(Document): name = fields.TextField( attr="name", fields={ 'raw': fields.TextField( analyzer=full_searchable_analyzer, search_analyzer=string_sort_analyzer ), 'suggest': fields.CompletionField(), } ) description = fields.TextField( fields={ 'raw': fields.TextField( analyzer=string_sort_analyzer, search_analyzer=string_sort_analyzer ), 'suggest': fields.CompletionField(), } ) class Index: name = 'accounts' settings = {'number_of_shards': 1, 'number_of_replicas': 0} class Django: model = Account fields = [ 'email', ]


objects data: 1-> {name: 'teste', description: 'testify a common taste', email: '[email protected]'} 2-> {name: 'testJonhy', description: ' asdasdkasçkdkldas', email: '[email protected]'} 3-> {name: 'Mariah', description: 'desctestsherealso', email: '[email protected]'}

s.query(MultiMatch(query='test', fields=fields, fuzziness=10)).execute() returns record 1

s.query("match_phrase", query='test').execute() returns nothing

q = s.filter("match_phrase", query='test').execute() returns nothing also

How can I make query's properly? The goal is to query and return all this documents.

Also I pretend to highlight and add Did You Mean feature. And I've already acomplish sugestions with: s.suggest('name', 'test', completion={'field': 'name.suggest'}).execute()

Can someone help me or point me some documentation where I can figure this out

Thanks

zeitler avatar May 13 '21 15:05 zeitler

For this s.query("match_phrase", query='test').execute()

Do this s.query("match_phrase", name='test').execute()

similarly for filter query change query with name of the field that you are looing into. ElasticSearch expects you to give the field names and the query_text you want to search.

Match phrase query is similar to the match query but is used to query text phrases. Phrase matching is necessary when the ordering of the words is important. Only the documents that contain the words in the same order as the search input are matched.

As per my deduction from your question, you just want to match your query with "test" in name field. So try using match like s.query("match", name='test').execute()

For this s.query(MultiMatch(query='test', fields=fields, fuzziness=10)).execute()

Try this, s.query(MultiMatch(query='test', fields=['name', 'description'], fuzziness='AUTO')).execute()

Let elasticsearch take care of fuzziness

Sachin-Kahandal avatar May 14 '21 17:05 Sachin-Kahandal

Hi @Sachin-Kahandal. Thank you very much for your help

Still not having the desired results.

Having this records: ID | Name | Description | Email
1 | admin | | [email protected]
2 | Constatine | sad | [email protected]
3 | Mariah | desctestsherealso | [email protected]
4 | teste | testify a common taste | [email protected]
5 | testJohny | asdasdkasçkdkldas | [email protected]

The goal is to have 3 results: Record 3 because description haves "test" in "desctestsherealso" Record 4 because name haves "test" in "teste" Record 5 because name haves "test" in "testJohny"

Tests:

Testing: s.query("match", name="test").execute().hits.total FAILED: expected: 3, obtained: 0

Testing: s.query("match", query="test").execute().hits.total FAILED: expected: 3, obtained: 0

Testing: s.filter("match", name="test").execute().hits.total FAILED: expected: 3, obtained: 0

Testing: s.query(MultiMatch(query="test", fields=["name", "description"])).execute().hits.total FAILED: expected: 3, obtained: 0

Testing: s.query(MultiMatch(query="test", fields=["name", "description"], fuzziness="AUTO")).execute().hits.total FAILED: expected: 3, obtained: 1

If I understood well, MultiMatch will return documents where test is in "name" AND in "description"

But what I pretend is documents where is in name OR in the description

The application have a landing search page. And it's intended to show all the documents that have the search keys, and after having the results I need to highlight the matching words.

The definition of the fields is correct? ... name = fields.TextField(attr="name", fields={ 'raw':` fields.TextField( analyzer=full_searchable_analyzer, search_analyzer=string_sort_analyzer ), 'suggest': fields.CompletionField(), }) ...

kind regards, Thank you very much

zeitler avatar May 17 '21 16:05 zeitler

Hi @zeitler, Ok now that I understand your problem,

  • What you need to solve this sort of problem is nGram/edgeNGram tokenizer.
  • These tokenizers break up text into configurable-sized tuples of letters.
  • For instance, the word "news", run through a min_gram:1, max_gram:2 nGram tokenizer would be broken up into the tokens "n", "e", "w", "s", "ne", "ew", and "ws".
  • This sort of analysis does really well when it comes to imprecise matching.

Also, with multimatch you can pass operator of your choice like query = MultiMatch(query=text, fields=['Name', 'Description'], operator="OR")

Reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html Examples: https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch

Hope this helps

Sachin-Kahandal avatar May 22 '21 16:05 Sachin-Kahandal

@zeitler did you manage to find the solution? can you post it and close if so?

Brechard avatar Sep 14 '21 10:09 Brechard