vespa icon indicating copy to clipboard operation
vespa copied to clipboard

Special tokens support

Open jeffkayne opened this issue 4 years ago • 7 comments

As mentioned on gitter, I am trying to query data (with a yql), to find all documents that match the string "&Free". The exact query I'm executing is: "SELECT text_raw FROM sources post WHERE text_raw CONTAINS '&free';" This query only filters documents that contain the string "free", not "&free".

The specific use-case for this is that I am looking for brand names within documents.

Vespa version: 7.165.5

jeffkayne avatar Mar 09 '20 11:03 jeffkayne

Is the use case here that "&free" is a proper name that you need to match?

bratseth avatar Mar 10 '20 07:03 bratseth

Yes, correct

jeffkayne avatar Mar 10 '20 09:03 jeffkayne

Please see match modes https://docs.vespa.ai/documentation/reference/search-definitions-reference.html#match. Default for string fields is text which implies tokenization where punctation characters are removed.

jobergum avatar Mar 10 '20 09:03 jobergum

I think this is to make "special tokens" work independently of the linguistics plugin then. I'll try to find some time somewhere to do that.

bratseth avatar Mar 10 '20 09:03 bratseth

Yes that would be great if we could use an escape, for example "\x##" to get an exact match of the string. Cheers!

jeffkayne avatar Mar 11 '20 09:03 jeffkayne

That's no problem but doesn't solve the problem because the engine need to decide at wqrite time what tokens to index.

bratseth avatar Mar 11 '20 09:03 bratseth

soon timed out

baldersheim avatar Sep 13 '23 20:09 baldersheim