tantivy-py
tantivy-py copied to clipboard
Term Query is not tokenized (?)
I'm testing tantivy-py
, which I'm finding pretty great. However, I bumped into what seems to be an issue with the Python
package: it seems that term queries are not tokenized when using the searcher.search(query, ..)
method, so I can't really use the en_stem
tokenizer (since it's not exposed for me to tokenize the query, only the indexing of documents).
I'm testing tavinty-py
with the Simple Wikipedia Example Set from Cohere
and here's what I see with a few sample queries:
- Australia monarchy --> no good hits unless I change it to Australia monarch
- Titanic sink --> no good hits unless I change it to Titan sink
Is this a "feature" or a "bug"? I don't mind tokenizing the query myself before calling the search
method, but tokenizers are not exposed in the Python
bindings.
Any suggestions?