luqum icon indicating copy to clipboard operation
luqum copied to clipboard

Question: treating untagged words or phrases as "full text search" across multiple (or all) fields

Open seandavi opened this issue 5 years ago • 6 comments

Luqum is working great for me and my test users, but one thing that the test users miss is the behavior of query_string to do a full-text search across all fields when no field is specified (eg., "London") . I see the ability to specify a default fields, but this results in a simple match query. I guess I am looking to convert these to multi-match with all available text fields? Any suggestions?

seandavi avatar Mar 03 '19 13:03 seandavi

Hi @seandavi

You're right this is not a supported scenario, but it is an interesting one.

Two solutions:

  • you modify your luqum search tree (using a TreeTransformer) before giving it to the elasticsearch query builder, to multiply the SearchField node and use a OR to join them.
  • you take courage and you modify the query builder so that if you pass a list as default_field, it builds a multi-match query ! In which case a pull request is welcome :-)

If you help in some way, just ask !

alexgarel avatar Mar 04 '19 10:03 alexgarel

For the time being, I'm going the cheap route and specifying _all as the default field for the match query for now. Users seem happy with the basic query_string behavior which appears to pretty much use the _all approach.

If I have a little time, I may play with the multi-match approach. If I get into trouble, I'll let you know.

As usual, thanks for taking the time to answer and clarify.

seandavi avatar Mar 04 '19 13:03 seandavi

I know it has been a while on this one. I noticed a per-field version of multi_match was recently implemented. I'd like to revisit the idea of multi_match on a set of default fields for bare words. I like your idea of converting to multi_match when default_field is a list. Could you give me some hints on where to focus if I want to implement? No urgency, but I thought I would ask.

seandavi avatar Nov 09 '19 01:11 seandavi

Just leaving a note here that to do this right would involve bare Word() and Phrase(), the latter requiring a different multi_match type.

seandavi avatar Nov 09 '19 01:11 seandavi

After a little playing with luqum.utils.LuceneTreeTransformer, this seems to do what I need. Note that multi_match is roughly translated to a bunch of OR queries across single-field match. The same is true of multi_match with phrases, except that match_phrase

class BareTextTransformer(luqum.utils.LuceneTreeTransformer):
    """Convert bare Words or Phrases to full text search

    In cases where a query string has bare text (no field
    association), we want to construct a DSL query that includes
    all fields in an OR configuration to perform the full
    text search against all fields. 
    This class can walk the tree and convert bare Word 
    nodes into the required set of SearchField objects. Note 
    that this is entirely equivalent to `multi_match` in terms
    of performance, etc. 
    """
    def __init__(self, fields=['title','abstract']):
        """Create a new BareTextTransformer
        Parameters
        ----------
        fields: list of str
            This is the list of fields that will used to 
            create the composite SearchField objects that
            will be OR'ed together to simulate full text
            search.
        
        Returns
        -------
        None. The tree is modified in place.
        """
        super()
        self.fields = fields
    
    def visit_word(self, node, parent):
        if(len(parent)>0 and (
                isinstance(parent[-1], luqum.tree.SearchField) or
                isinstance(parent[-1], luqum.tree.Range))):
            return node
        else:
            search_list = [SearchField(f, node) for f in self.fields]
            return Group(OrOperation(*search_list))

    def visit_phrase(self, node, parent):
        if(len(parent)>0 and (
                isinstance(parent[-1], luqum.tree.SearchField) or
                isinstance(parent[-1], luqum.tree.Range))):
            return node
        else:
            search_list = [SearchField(f, node) for f in self.fields]
            return Group(OrOperation(*search_list))

And, to use:

  tree = parser.parse(q)
  transformer = BareTextTransformer()
# tree below now has expanded Group(OrOperations....) for each
# field in the BareTextTransformer `fields`
  tree = transformer.visit(tree)

seandavi avatar Nov 09 '19 02:11 seandavi

Using a multi_match for the * field seems to work for me.

es_query_builder = ElasticsearchQueryBuilder(
    **schema_analyzer.query_builder_options(),
    field_options={"*": {"match_type": "multi_match"}},
)

thpica avatar Jul 13 '21 13:07 thpica