Xapiand icon indicating copy to clipboard operation
Xapiand copied to clipboard

N-Gram index/search example

Open isoos opened this issue 6 years ago • 3 comments

Do you have any example how to do N-gram based search indexing and retrieval?

E.g. I'd like to index the phrase "Search and Storage Server", and when I search for "storaeg" it should have a good chance to find it, because some of the N-grams matches the indexed N-grams ("sto", "tor", "ora").

I think xapian supports some kind of N-grams, but I couldn't find an end-to-end example with xapiand.

isoos avatar Apr 01 '19 21:04 isoos

Out of the box, xapian doesn’t support ngrams; what it supports is wildcard expansion (which could be used as an alternative to edge ngrams). If what you need are full ngrams, Xapiand currently doesn’t have them, but it’s something that shouldn’t be too hard to implement and definitely something I’d be interesting in adding. For the time being, you’d have to pass the ngrams already formed from your client side, perhaps as a keywords array.

Kronuz avatar Apr 02 '19 12:04 Kronuz

Thanks for the details! Would I be able to query that keyword array with a threshold filter? E.g. return only the results where the array has at least 66% match with the queries N-grams?

isoos avatar Apr 02 '19 12:04 isoos

Sorry, I don’t quite get what you mean. You can add weights when you query something.

Kronuz avatar Apr 02 '19 14:04 Kronuz