Examine Escape() returns 0 results, when the Native query does

When using the Escape() query is brings back no results, but if I extract he RAW query and run it, then it returns the expected number of results

This brings back 0 results:

if (!string.IsNullOrEmpty(criteria.Colour))
{
  query = query.And().Field("colour",  criteria.Colour.Escape());
}

But if I extract the RAW query and run it... It works:

string stringToParse = query.ToString();
int indexOfPropertyValue = stringToParse.IndexOf("LuceneQuery:") + 12;
string rawQuery = stringToParse.Substring(indexOfPropertyValue).TrimEnd('}');
var response = index.Searcher.CreateQuery("content").NativeQuery(rawQuery).Execute(QueryOptions.SkipTake((criteria.CurrentPage - 1) * criteria.PageSize, criteria.PageSize));

Feb 22 '23 18:02 AaronSadlerUK

@AaronSadlerUK .Field and .NativeQuery do fundamentally different things.

An escaped field search builds a "phrase query" and as such doesn't need to go through the query parser, whilst a native query will use the query parser to turn your string into an actual query and execute it "raw". The phrase query ensures an exact match for each term and not parts of each term.

What "type" is your "colour" field in the index, and what format are you expecting the value it to be? The likelihood here is your value isn't indexed quite as you'd expect.

e.g. "light-blue" would actually be indexed as "light" and "blue" by default – searching for "light-blue".Escape() will do a phrase query for "light-blue" but that's not the value in the index. You'd need to index the value accordingly, such as changing the analyzer for that specific case.

Use a tool like Luke to inspect your index and find out what's happening.

Feb 24 '23 11:02 callumbwhyte

Thanks for that @callumbwhyte I am trying to open the indexes with like but having a nightmare 😅

Once I manage to get in I'll see what's happening in terms of the Indexed value. However it does work with the backoffice and as a raw query, so for example if I search with "Dark Blue" using the nativequery it works.

But if I do it using .Escape() it does not, I should know more if I can get Luke to work

Feb 24 '23 11:02 AaronSadlerUK

The backoffice is a whole different beast again ;-)

Feb 24 '23 11:02 callumbwhyte

@callumbwhyte Any ideas with this error?

I've tried rebuilding etc...

Feb 24 '23 11:02 AaronSadlerUK

Finally found a version which can read the indexes... 5.2.0

Can be found here for anyone looking: https://github.com/DmitryKey/luke/releases/tag/luke-5.2.0

Feb 24 '23 11:02 AaronSadlerUK

Colour looks like this in the index:

It's also indexes as FullText

This field is used as an attribute, so the searching on it is always exact. namedOptions.FieldDefinitions.AddOrUpdate(new FieldDefinition("colour", FieldDefinitionTypes.FullText));

Any other thoughts or pointers?

Feb 24 '23 12:02 AaronSadlerUK

@AaronSadlerUK If you right click on the value you can view the tokens, you should see 2 tokens: "light" and "green".

If you're trying to match 'Light Green' to either of those terms it won't match.

Rather than indexing your field as FullText you could opt for FieldDefinitionTypes.Raw which shouldn't be analyzed and therefore the tokens for the value indexed will be "Light Green" exactly as you expect.

You could also modify the value at index time in the TransformingIndexValues event, perhaps removing the space entirely?

Feb 24 '23 12:02 callumbwhyte

I was thinking about removing the space, but then I would need to create a whole thing to remove all the other different characters which are used such as / and - in different places.

I'll try the RAW way.

Am I right in thinking what I'm trying to do here would normally be done as a faceted search?

Feb 24 '23 12:02 AaronSadlerUK

this seems quite similar to #325, which has my current workaround but I'd love to get rid of that one. Also an option is to use the token analyzer which doesn't split the text

		options.IndexValueTypesFactory = new Dictionary<string, IFieldValueTypeFactory>
		{
			[FIELD_DEFINITION_KEYWORD] = new DelegateFieldValueTypeFactory(name =>
				new GenericAnalyzerFieldValueType(
					name,
					_loggerFactory,
					new KeywordAnalyzer(),
					false
				)
			)
		};

Feb 24 '23 13:02 dealloc

@dealloc I'm also seeing native phrase query gives results as opposed to an escaped term? (examine 3.1)

if (_examineManager.TryGetIndex(Constants.UmbracoIndexes.ExternalIndexName, out var index))
{
    var searcher = index.Searcher;
    
    //var query = searcher.CreateQuery(IndexTypes.Media).Field("folderDevelopmentCode", developmentCode.Escape());

    var query = searcher.CreateQuery(IndexTypes.Media).NativeQuery($"+folderDevelopmentCode:\"{developmentCode}\"");
    var results =  query.Execute();

    if (!results.Any()){
    ...
    }

both result in Category: "media", LuceneQuery: {+folderDevelopmentCode:"E500117"} (note the enclosing quotes for a phrase query)

but only the nativequery coded example has a result set?

Aug 02 '23 09:08 mistyn8

I did notice there are tests for this.. that pass... https://github.com/Shazwazza/Examine/runs/13242245210#r0s6 https://github.com/Shazwazza/Examine/blob/dev/src/Examine.Test/Examine.Lucene/Search/FluentApiTests.cs#L923-L934C61

 //now escape it
                var exactcriteria = searcher.CreateQuery("content");
                var exactfilter = exactcriteria.Field("__Path", "-1,123,456,789".Escape());
                var results2 = exactfilter.Execute();
                Assert.AreEqual(1, results2.TotalItemCount);

                //now try with native
                var nativeCriteria = searcher.CreateQuery();
                var nativeFilter = nativeCriteria.NativeQuery("__Path:\\-1,123,456,789");
                Console.WriteLine(nativeFilter);
                var results5 = nativeFilter.Execute();
                Assert.AreEqual(1, results5.TotalItemCount);

Aug 02 '23 10:08 mistyn8

Actually seems that new FieldDefinitionCollection(new FieldDefinition("__Path", "raw") has some involvment here?

trying to develop a failing test it seems that native query also manipulates.. eg var nativeFilter = nativeCriteria.NativeQuery("folderDevelopmentCode:\"E500117\""); results in +folderDevelopmentCode:e500117 so seems to decide it's not a phrase so reverts to standard search also var nativeFilter = nativeCriteria.NativeQuery("folderDevelopmentCode:\"E500 117\""); results in `+folderDevelopmentCode:"e500 117" note the lower casing..

so I think the issue with Escape() is that it doesn't lowercase? var exactFilter = exactCriteria.Field("folderDevelopmentCode", "E500 117".ToLower().Escape()); for me gets to the escaped fluent api returning the same as the native query.

 [Test]
 public void Phrase_Matching()
 {
     var analyzer = new StandardAnalyzer(LuceneInfo.CurrentVersion);

     using (var luceneDir = new RandomIdRAMDirectory())
     using (var indexer = GetTestIndex(
         luceneDir,
         analyzer
         //, new FieldDefinitionCollection(new FieldDefinition("folderDevelopmentCode", "raw"))
         ))
     {


         indexer.IndexItems(new[] {
             new ValueSet(1.ToString(), "media",
                 new Dictionary<string, object>
                 {
                     {"folderDevelopmentCode", "E500 117"}
                 }),
              new ValueSet(2.ToString(), "media",
                 new Dictionary<string, object>
                 {
                     {"folderDevelopmentCode", "E500 118"}
                 })
             });

         var searcher = indexer.Searcher;

         var criteria = searcher.CreateQuery("media");
         var filter = criteria.Field("folderDevelopmentCode", "E500 117");
         Console.WriteLine($"FILTER: {filter}");
         var results1 = filter.Execute();
         //expecting 2 as this results in E500 or 117 query (prob not what we want but that's lucene)
         Assert.AreEqual(2, results1.TotalItemCount);

         //native
         var nativeCriteria = searcher.CreateQuery("media");
         var nativeFilter = nativeCriteria.NativeQuery("folderDevelopmentCode:\"E500 117\"");
         Console.WriteLine($"NATIVE: {nativeFilter}");
         var results3 = nativeFilter.Execute();
         Assert.AreEqual(1, results3.TotalItemCount);

         //exact match
         var exactCriteria = searcher.CreateQuery("media");
         var exactFilter = exactCriteria.Field("folderDevelopmentCode", "E500 117".Escape());
         Console.WriteLine($"EXACT: {exactFilter}");
         var results2 = exactFilter.Execute();
         Assert.AreEqual(1, results2.TotalItemCount);
     }
 }

Aug 02 '23 11:08 mistyn8

Hi all, I know this topic is old but will add some clarity:

As @callumbwhyte notes, it creates a PhraseQuery, but this is not susceptible to the same tokenizing and analysis done by the default specified because it operates outside of the query parser and it creates an exact match.

Here's an example:

    var analyzer = new StandardAnalyzer(LuceneInfo.CurrentVersion);
    using (var luceneDir = new RandomIdRAMDirectory())
    using (var indexer = GetTestIndex(luceneDir, analyzer))
    {
        indexer.IndexItems(new[] {
            ValueSet.FromObject(1.ToString(), "content",
                new { phrase = "If You Can't Stand the Heat, Get Out of the Kitchen" }),
            ValueSet.FromObject(2.ToString(), "content",
                new { phrase = "When the Rubber Hits the Road" }),
            ValueSet.FromObject(3.ToString(), "content",
                new { phrase = "A Fool and His Money Are Soon Parted" }),
            ValueSet.FromObject(4.ToString(), "content",
                new { spaphraseth = "A Hundred and Ten Percent" }),
        });

        var searcher = (BaseLuceneSearcher)indexer.Searcher;

        var query = searcher
            .CreateQuery(IndexTypes.Content)
            .NativeQuery("+phrase:\"Get Out of the Kitchen\"");

        Console.WriteLine(query);
        var results = query.Execute();

        Assert.AreEqual(1, results.TotalItemCount);

        query = searcher
            .CreateQuery(IndexTypes.Content)
            .Field("phrase", "Get Out of the Kitchen".Escape());

        Console.WriteLine(query);
        results = query.Execute();

        Assert.AreEqual(1, results.TotalItemCount);
    }

What does this yield?

The first assertion works - and the output looks like: { Category: content, LuceneQuery: +(+phrase:"get out ? ? kitchen") }
The second assertion fails - and the output looks like: { Category: content, LuceneQuery: +phrase:"Get Out of the Kitchen" }

Why?

The first query uses the Query Parser to parse a phrase since it is contained in quotes and it passes through the tokenizer/analyzer which in this case is the StandardAnalyzer which lowercases everything and strips out common words.
The second query uses PhraseQuery under the hood, this does not go through the tokenizer/analyzer and becomes an exact match.
- It does not match because the data in the index has been tokenized/analyzed, the actual value in there is lowercased and has common words removed

I understand the confusion around 'Escape()' but it does essentially mean 'exact match', not phrase. So if you are using Escape() than you would need to declare your field type as 'Raw' which equates to using the KeywordAnalyzer under the hood.

So where do we go from here?

We can introduce a .Phrase() extension method which will create a PhraseQuery based on the query parser and will respect the analyzer used and Escape() will remain an exact match.
We can change the .Escape() handling to use a PhraseQuery based on the query parser and will respect the analyzer which will make Escape() essentially become a PhraseQuery - but IMO this is a breaking/unbreaking change since some folks have worked around this behavior so far. This will also mean that .Escape() is no longer really an exact match, but it sort of would be based on the analyzer of that field. Although this might seem like a breaking change - if we look at the FluentApiTests for all usages of .Escape() these are solely based on RAW (KeywordAnalyzer) field types. So... Perhaps this is the correct fix because Escape() probably wouldn't work unless it is a raw field anyways. I'm a +1 for this change.

Jun 14 '24 17:06 Shazwazza

My vote goes to having a Phrase() method.

Perhaps the terminology of Escape() is also confusing here? Maybe Exact() is more precise, and could be added now + Escape() obsoleted without breaking anything and leaving space for future...

Jun 14 '24 17:06 callumbwhyte

My immediate thought was exactly the same as Callum, so that gets my vote. Makes things clearer and doesn’t break existing code, but prompts to move to the newer options.

Jun 14 '24 18:06 mattbrailsford

@callumbwhyte + @mattbrailsford

Thanks! Escape was originally, a very long time ago, meant to ensure that reserved chars were escaped and didn't cause lucene issues, but then this evolved in various ways over the years because it turns out we didn't need to do that with phrase queries.

I'm really not sure what the use case would be moving forward to introduce an 'Exact' method that would be the same as the current 'Escape' - because the value would need to match exactly to what the value is stored in the index based on what the field's analyzer is. I don't want to re-introduce more confusion and have to re-explain this all over again because we'll be back in the same boat.

I'm leaning towards creating "Phrase" and then just obsoleting "Escape"?

Jun 14 '24 20:06 Shazwazza

Examine Examine copied to clipboard

Escape() returns 0 results, when the Native query does

Examine
Examine copied to clipboard