django-elasticsearch-dsl-drf icon indicating copy to clipboard operation
django-elasticsearch-dsl-drf copied to clipboard

How to create a boolean query with both "should" and "must" clauses?

Open yattias opened this issue 3 years ago • 4 comments

Questions

Hi @barseghyanartur. First, thanks for this great package. It has been extremely useful.

I was unable to find an answer to my question in the docs or by examining the source code so I figured I'd take a look.

Basically, what I need to do is generate a boolean query where one part of it is in a must clause and the other is in a should. More specifically, the query I would like to generate is as such:

  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "fields": [ SOME_FIELDS ],
            "operator": "and",
            "query": "SOME QUERY TERMS"
          }
        }
      ],
      "should": [
          {
            "term": {
                "SPECIFIC_FIELD": "SOME QUERY TERMS"
            }
          }
      ]
    }
  },

The reason for the above is to boost a phrase match.

With that being said, whenever I try mixing the following backends:

filter_backends = [
        PhraseSearchFilterBackend, # Custom
        MultiMatchSearchFilterBackend,
]

What ends up happening is that my term query ends up in a must clause even though I specify matching="should" pretty much everywhere.

I even debugged this all the way to base.py where I confirmed matching="should" yet somehow the final query ends up all in the "must".

Any ideas what I'm doing wrong?

yattias avatar Jun 28 '21 02:06 yattias

For reference, here is my configuration:

class PaperDocumentView(DocumentViewSet):
    document = PaperDocument
    permission_classes = [ReadOnly]
    serializer_class = PaperDocumentSerializer
    pagination_class = LimitOffsetPagination
    lookup_field = 'id'
    filter_backends = [
        PhraseSearchFilterBackend,
        MultiMatchSearchFilterBackend,
        CompoundSearchFilterBackend,
        FacetedSearchFilterBackend,
        FilteringFilterBackend,
        PostFilterFilteringFilterBackend,
        DefaultOrderingFilterBackend,
        OrderingFilterBackend,
        HighlightBackend,
    ]

    search_fields = {
        'doi': {'boost': 3, 'fuzziness': 1},
        'title': {'boost': 2, 'fuzziness': 1},
        'raw_authors.full_name': {'boost': 1, 'fuzziness': 1},
        'abstract': {'boost': 1, 'fuzziness': 1},
        'hubs_flat': {'boost': 1, 'fuzziness': 1},
    }

    multi_match_search_fields = {
        'doi': {'boost': 3, 'fuzziness': 1},
        'title': {'boost': 2, 'fuzziness': 1},
        'raw_authors.full_name': {'boost': 1, 'fuzziness': 1},
        'abstract': {'boost': 1, 'fuzziness': 1},
        'hubs_flat': {'boost': 1, 'fuzziness': 1},
    }

    multi_match_options = {
        'operator': 'and'
    }

    post_filter_fields = {
        'hubs': 'hubs.name',
    }

    faceted_search_fields = {
        'hubs': 'hubs.name'
    }

    filter_fields = {
        'publish_date': 'paper_publish_date'
    }

    ordering = ('_score', '-hot_score', '-discussion_count', '-paper_publish_date')

    ordering_fields = {
        'publish_date': 'paper_publish_date',
        'discussion_count': 'discussion_count',
        'score': 'score',
        'hot_score': 'hot_score',
    }

    highlight_fields = {
        'raw_authors.full_name': {
            'field': 'raw_authors',
            'enabled': True,
            'options': {
                'pre_tags': ["<mark>"],
                'post_tags': ["</mark>"],
                'fragment_size': 1000,
                'number_of_fragments': 10,
            },
        },
        'title': {
            'enabled': True,
            'options': {
                'pre_tags': ["<mark>"],
                'post_tags': ["</mark>"],
                'fragment_size': 2000,
                'number_of_fragments': 1,
            },
        },
        'abstract': {
            'enabled': True,
            'options': {
                'pre_tags': ["<mark>"],
                'post_tags': ["</mark>"],
                'fragment_size': 5000,
                'number_of_fragments': 1,
            },
        }
    }

yattias avatar Jun 28 '21 02:06 yattias

Wondering if someone here can help 🙏

yattias avatar Jun 30 '21 18:06 yattias

extend base class get_queryset() and define your own queries there, rather than using search-filter-backends

def get_queryset(self): 
    # getting search param from request
    request = self.request
    text_raw = request.GET.get("search")
    query0 = multi-match query
    query1 = match query
    query2 = matchphrase
    etc...
    q1 = Bool(should=[query0, query1, tquery1, dquery1, tquery3, dquery3, item_url_query])
    queryset = Search(using=self.client, index=self.index, doc_type=self.document._doc_type.name).query(q1)
    return queryset

You will have finer control over your queries with this

Sachin-Kahandal avatar Jul 21 '22 06:07 Sachin-Kahandal

This question comes up regularly. I'll add it to the FAQ, but TL;DR:

If you need a combination of ANDs and ORs, use SimpleQueryStringSearchFilterBackend. Check for examples here and in docs.

barseghyanartur avatar Jul 21 '22 07:07 barseghyanartur