fix(search): Search not returning result if query text contains forward slash
We recently discovered an issue while performing Search operation on Datahub.
Issue: When a search query text contains a forward slash /, Search result does not return the expected result.
As an example:
We have a dataset exists on datahub with Name: databox.arn:aws:s3:::dl-odin-pro-datalake-16vmobiqqrhwn/odin/v2/autotrader-au/notification
When we search with the query text databox.arn:aws:s3:::dl-odin-pro-datalake-16vmobiqqrhwn/odin/v2/autotrader-au/notification from Datahub UI or GraphQL API, the search result does not return any result.
Fix:
Instead of 4 preceding backslashes "\\\\" we need to use 2 preceding backslashes "\\" while escaping Forward Slash.(forward slash / is a reserved character in elastic search)
The Problem was when we were searching with an input query string databox.arn:aws:s3:::dl-odin-pro-datalake-16vmobiqqrhwn/odin/v2/autotrader-au/notification the escapeForwardSlash method of ResolverUtils.java converted the input query string into databox.arn:aws:s3:::dl-odin-pro-datalake-16vmobiqqrhwn\\/odin\\/v2\\/autotrader-au\\/notification(added 2 backward slashes) then the query string again JSON Stringified by SimpleQueryStringBuilder while building the Search Query, so the final query string became databox.arn:aws:s3:::dl-odin-pro-datalake-16vmobiqqrhwn\\\\/odin\\\\/v2\\\\/autotrader-au\\\\/notification with additional 2 extra back ward slashes (\). Due to this, the elastic search query was failed to return any record.
Test Result with the fix:
Note: We also found that this class (datahub-frontend/app/utils/SearchUtil.java) also have the similar code https://github.com/datahub-project/datahub/blob/82e5a049ca27633bd3d74fae047ef42782c526f9/datahub-frontend/app/utils/SearchUtil.java#L23, but this is being used only from test cases. I updated the same as well in the PR.
Checklist
- [ ] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
- [ ] Links to related issues (if applicable)
- [ ] Tests for the changes have been added/updated (if applicable)
- [ ] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
- [ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub
Summary by CodeRabbit
- Bug Fixes
- Improved handling of forward slash escape sequences in search functionality.
[!IMPORTANT]
Review skipped
Auto reviews are disabled on this repository.
Please check the settings in the CodeRabbit UI or the
.coderabbit.yamlfile in this repository. To trigger a single review, invoke the@coderabbitai reviewcommand.You can disable this status message by setting the
reviews.review_statustofalsein the CodeRabbit configuration file.
Walkthrough
The primary change modifies the escapeForwardSlash method in both SearchUtil and ResolverUtils to replace occurrences of / with \/ instead of \\\/. Corresponding test cases in SearchUtilTest were updated to reflect this change in the escape sequences for forward slashes.
Changes
| File Path | Change Summary |
|---|---|
.../app/utils/SearchUtil.java |
Modified escapeForwardSlash to replace / with \\/ instead of \\\/. |
.../test/utils/SearchUtilTest.java |
Updated test cases to reflect changes in escaping forward slashes and asterisks. |
.../graphql/resolvers/ResolverUtils.java |
Modified escapeForwardSlash to replace / with \/ instead of \\\/. |
Poem
In code's deep woods where slashes roam,
A tweak, a change, we now call home.
To/we add but one small line,
\/it sings, a tune divine.
Tests will pass, the code refined,
Our paths escape, so well-aligned.
🌟🐇✨
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
Tips
Chat
There are 3 ways to chat with CodeRabbit:
- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
I pushed a fix in commit <commit_id>.Generate unit testing code for this file.Open a follow-up GitHub issue for this discussion.
- Files and specific lines of code (under the "Files changed" tab): Tag
@coderabbitaiin a new review comment at the desired location with your query. Examples:@coderabbitai generate unit testing code for this file.@coderabbitai modularize this function.
- PR comments: Tag
@coderabbitaiin a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:@coderabbitai generate interesting stats about this repository and render them as a table.@coderabbitai show all the console.log statements in this repository.@coderabbitai read src/utils.ts and generate unit testing code.@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.@coderabbitai help me debug CodeRabbit configuration file.
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.
CodeRabbit Commands (Invoked using PR comments)
@coderabbitai pauseto pause the reviews on a PR.@coderabbitai resumeto resume the paused reviews.@coderabbitai reviewto trigger an incremental review. This is useful when automatic reviews are disabled for the repository.@coderabbitai full reviewto do a full review from scratch and review all the files again.@coderabbitai summaryto regenerate the summary of the PR.@coderabbitai resolveresolve all the CodeRabbit review comments.@coderabbitai configurationto show the current CodeRabbit configuration for the repository.@coderabbitai helpto get help.
Other keywords and placeholders
- Add
@coderabbitai ignoreanywhere in the PR description to prevent this PR from being reviewed. - Add
@coderabbitai summaryto generate the high-level summary at a specific location in the PR description. - Add
@coderabbitaianywhere in the PR title to generate the title automatically.
CodeRabbit Configuration File (.coderabbit.yaml)
- You can programmatically configure CodeRabbit by adding a
.coderabbit.yamlfile to the root of your repository. - Please see the configuration documentation for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation:
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
Documentation and Community
- Visit our Documentation for detailed information on how to use CodeRabbit.
- Join our Discord Community to get help, request features, and share feedback.
- Follow us on X/Twitter for updates and announcements.
I think the issue here is that
SimpleQueryStringBuilderis not always involved in the query. For the cases whenSimpleQueryStringBuilderis not used I believe the quoting is correct. We likely need to apply your suggested change only whenSimpleQueryStringBuilderis used.
@david-leifker
The same problem exists with other Query Builders as well.
As an example, I found the same issue for match_phrase, match_phrase_prefix, term as well.
Please find below query generated for above-mentioned Query Builders for the same search input string.
{
"match_phrase" : {
"qualifiedName.wordGrams2" : {
"query" : "dl-odin-pro-datalake-16vmobiqqrhwn\\\\/odin\\\\/v2\\\\/autotrader-au\\\\/notification",
"slop" : 0,
"zero_terms_query" : "NONE",
"boost" : 14.400001,
"_name" : "qualifiedName"
}
}
}
{
"match_phrase_prefix" : {
"fullName.delimited" : {
"query" : "dl-odin-pro-datalake-16vmobiqqrhwn\\\\/odin\\\\/v2\\\\/autotrader-au\\\\/notification",
"slop" : 0,
"max_expansions" : 50,
"zero_terms_query" : "NONE",
"boost" : 4.48,
"_name" : "fullName"
}
}
}
{
"term" : {
"type.keyword" : {
"value" : "dl-odin-pro-datalake-16vmobiqqrhwn\\\\/odin\\\\/v2\\\\/autotrader-au\\\\/notification",
"case_insensitive" : true,
"boost" : 7.0,
"_name" : "type"
}
}
}
So considering this, I think that the fix provided on this PR on datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ResolverUtils.java should be necessary to fix this issue, wdyt?
Thanks!