Jens Reimann
Jens Reimann
> An improvement I can imagine we could add is to look for step in the path when this message is returned to the user. If it's not found, we...
I applied nightly `rustfmt`.
My use case is to have all simple tokens plus all ngrams.
I was able to incorporate most of the feedback you mentioned. It's less explicit without the `enum`, but works the same way. There was just one call to `second.advance()` missing,...
> If you feel like code-golfing, I think those two calls to second.advance() could even be merged I like that, pushed. So, the remaining thing seems to be the position....
Fixed the test issue.
> Do you have a reference for a ngram tokenizer that ends the ngram on whitespace? The example above? > .filter(Stemmer::new(Language::English)) will give unexpected results Yea, I noticed that :D...
> I meant a reference that does the tokenization in `September`, `October` you suggested. That's the `SimpleTokenizer` one. It gives me: ``` september october ```
No it is not. I am sorry, but then I don't understand your question.
So I can guess I can come close to that by somehow reversing the API: ```rust let ngram = NgramTokenizer::all_ngrams(3, 8).unwrap(); let mut text = TextAnalyzer::builder( Stemmer::new(Language::English) .transform(LowerCaser.transform(RemoveLongFilter::limit(40).transform(SimpleTokenizer::default()))) .chain(LowerCaser.transform(RemoveLongFilter::limit(40).transform(SimpleTokenizer::default()))) .chain(LowerCaser.transform(ngram)),...