element-desktop icon indicating copy to clipboard operation
element-desktop copied to clipboard

Search syntax errors meta issue

Open turt2live opened this issue 3 years ago • 12 comments

Steps to reproduce

  1. Search an encrypted room for a user ID.

Outcome

What did you expect?

Search results :D

What happened instead?

image

Operating system

Windows 10

Application version

Nightly (2021-03-22)

How did you install the app?

The Internet

Homeserver

t2l.io

Will you send logs?

No

turt2live avatar Mar 23 '22 15:03 turt2live

Hi @turt2live, I am an Outreachy applicant. I want to work on this issue, can you assign it to me? Thank you.

EECvision avatar Mar 29 '22 08:03 EECvision

Sure, though do note that this might be a more challenging issue to solve - visit #element-dev:matrix.org on Matrix if you run into issues :)

turt2live avatar Mar 29 '22 14:03 turt2live

Alright, I will on it. Thank you for assigning it to me.

EECvision avatar Mar 29 '22 15:03 EECvision

Hi @turt2live , I have looked at this issue. These are my findings below.

This issue occurs on the "nightly" build of Element Desktop, not Element Web. On the web version, instead of showing the error message, it simply says "No results"

When I looked further into the issue, I noticed that searching for @emmanuel_ezeka instead of @emmanuel_ezeka:matrix.org returns the expected result. Therefore, the issue is the colon : present in the searchTerm.

With the help of my mentor, I realized that Element Desktop uses the Rust library seshat for search. seshat is based on tantivy. When we search for messages using tantivy, we can provide specific fields, like room_id:something. tantivy parses these name:value strings as documented at https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html when we search for a mxid, the colon confuses tantivy and it thinks we are providing a name:value string. There is an open issue in tantivy to providing a more tolerant search interface: https://github.com/quickwit-oss/tantivy/issues/5 but it has not been worked on.

Currently, there are two approaches to solving the issue. While the first approach solves the problem at hand, it can be deemed temporal.

  1. To replace the colon in the searchTerm with either a hyphen "-" or a question mark "?" or to remove ":matrix.org" entirely from the search term before giving it to the onSearch function.

  2. To work on tantivy to providing a more tolerant search interface.

I would like to implement the first approach while we look for a way to solve the issue permanently.

Thank you.

EECvision avatar Mar 29 '22 18:03 EECvision

Does tantivy offer a way to escape query parameters? For example, if we supplied it term:"@userid:example.org" (adding quotes). I'm not familiar with tantivy's query language, though I would be surprised if it didn't have a way to escape characters out of the query string.

If tantivy indeed does not have a way to escape the function, let's supply it with something that still results in the the user ID being picked up in messages (this is important for moderation) but doesn't produce errors. Parsing the user ID with our existing permalink classes should give the localpart as a string, and somewhere we have regex for detecting user IDs in strings.

turt2live avatar Mar 29 '22 18:03 turt2live

This term:"@userid:example.org" will be another brilliant way to solve it. Let me try it out and see if it works.

EECvision avatar Mar 29 '22 19:03 EECvision

Hi @turt2live, I have not found a way to escape query parameters. However, removing :example.org from @userid:example.org solves the issue and make sense as well since the organization name (example.org) is the same for all users.

Should I go-ahead to implement it?

EECvision avatar Mar 29 '22 19:03 EECvision

The domain will change depending on what the user is searching for - there should be a regex somewhere which identifies and parses user IDs that we can use instead.

turt2live avatar Mar 29 '22 19:03 turt2live

Yes, I understand what you mean.

EECvision avatar Mar 29 '22 20:03 EECvision

Hi @turt2live, can you look at the highlighted code below, it identifies and parses user IDs that we can use. error2

I would like to know what you think about it. Thank you

EECvision avatar Mar 29 '22 22:03 EECvision

It would probably be best to open a PR for wider team review at this stage, as others might have ideas for a solution which works most generally.

turt2live avatar Mar 30 '22 01:03 turt2live

Alright. Thank you.

EECvision avatar Mar 30 '22 06:03 EECvision