cqengine icon indicating copy to clipboard operation
cqengine copied to clipboard

New queries

Open markostijak opened this issue 6 years ago • 1 comments

I would like to contribute with new type of queries that are suitable for reverse search applications. These new queries would be:

  1. StringIsStartingWith - Opposite to StringStartsWith
  2. StringIsEndingWith - Opposite to StringEndsWith
  3. StringIsContainedInOrder - Slightly different version of StringIsContainedIn, when using list of strings, preserve that order in document. This would be suitable for wildcard queries, i.e s1*s2.
  4. StringIsMatchedByRegex - maybe in some next iteration

The first two queries can be replaced with StringIsContainedIn, but for big documents I think this isn't appropriate solution.

For the first 3 queries, I think that InvertedRadixTreeIndex would be suitable.

Some use-case for these queries: Suppose we have URL monitoring application in which user will be notified when certain url appears. Users can define rules which will be used to match those certain urls. These rules could be:

  1. Urls that starts with specific string (i.e https://www.amazon.com) - StringIsStartingWith query
  2. Urls that ends with specific string (i.e .com) - StringIsEndingWith query
  3. Urls that contains specific string (i.e shipping) - StringIsContainedIn query
  4. Urls that matches wildcard (i.e amazon/*/cart) - StringIsContainedInOrder query
  5. Urls that are equal to specific string (i.e https://www.amazon.com) - Equal query

What is your opinion on this?

markostijak avatar Jul 02 '19 08:07 markostijak

I think those would be great additions, and would be much appreciated!

Feel free to go ahead and put together a PR, and I'm happy to help if you have any questions!

npgall avatar Jul 02 '19 18:07 npgall