datahub icon indicating copy to clipboard operation
datahub copied to clipboard

Search queries require all terms to match, or nothing is returned

Open MatMoore opened this issue 1 year ago • 1 comments

Describe the bug If users enter search queries with multiple terms, the query must be extremely precise to return results. Datahub will not return matches unless all of the search terms are present.

To Reproduce Steps to reproduce the behavior:

  1. Pick any entity in the catalogue
  2. Copy and paste some words from its description into search - it should show up in the search results
  3. Add or change a single term to something that doesn't match and then repeat the search - now nothing will be returned

A contrived example on the demo instance: This table has basic information about a customer, as well as some derived facts based on a customer's orders vs "This table has simple information about a customer, as well as some derived facts based on a customer's orders"

The behaviour is the same in both the React frontend and in the GraphQL API.

Expected behavior Providing that quotes are not used around the search term, I would expect that only one term needs to match for an entity to be returned in the search results. Entities that match some but not all terms should be ranked lower but not excluded from the result set.

This is likely to be particularly problematic for users who are less sure of what they are looking for, tend towards natural language queries.

In our use case we are hoping to roll out the catalogue to a very diverse set of users, and there will be some user groups who work less closely with the data. These users would be impacted a lot if the search has low recall.

Desktop (please complete the following information):

  • OS: MacOS
  • Browser: Chrome
  • Version: Tested in versions 0.13.1, 0.12.0

Additional context Datahub has an exactMatch config setting, but this is defaulted to false, so this doesn't explain why we are seeing this exclusive behaviour.

      ## Configuration around exact matching for search
      exactMatch:
        ## if false will only apply weights, if true will exclude non-exact
        exclusive: false

This is also not part of Elasticsearch's simple query string:

For example, a query string of capital of Hungary is interpreted as capital OR of OR Hungary.

MatMoore avatar Apr 18 '24 16:04 MatMoore

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

github-actions[bot] avatar May 19 '24 01:05 github-actions[bot]

This issue was closed because it has been inactive for 30 days since being marked as stale.

github-actions[bot] avatar Jun 18 '24 01:06 github-actions[bot]