tantivy-py icon indicating copy to clipboard operation
tantivy-py copied to clipboard

Term Query is not tokenized (?)

Open afbarbaro opened this issue 8 months ago • 9 comments

I'm testing tantivy-py, which I'm finding pretty great. However, I bumped into what seems to be an issue with the Python package: it seems that term queries are not tokenized when using the searcher.search(query, ..) method, so I can't really use the en_stem tokenizer (since it's not exposed for me to tokenize the query, only the indexing of documents).

I'm testing tavinty-py with the Simple Wikipedia Example Set from Cohere and here's what I see with a few sample queries:

  • Australia monarchy --> no good hits unless I change it to Australia monarch
  • Titanic sink --> no good hits unless I change it to Titan sink

Is this a "feature" or a "bug"? I don't mind tokenizing the query myself before calling the search method, but tokenizers are not exposed in the Python bindings.

Any suggestions?

afbarbaro avatar Jun 06 '24 09:06 afbarbaro