Jens Reimann

Results 627 comments of Jens Reimann

> An improvement I can imagine we could add is to look for step in the path when this message is returned to the user. If it's not found, we...

My use case is to have all simple tokens plus all ngrams.

I was able to incorporate most of the feedback you mentioned. It's less explicit without the `enum`, but works the same way. There was just one call to `second.advance()` missing,...

> If you feel like code-golfing, I think those two calls to second.advance() could even be merged I like that, pushed. So, the remaining thing seems to be the position....

> Do you have a reference for a ngram tokenizer that ends the ngram on whitespace? The example above? > .filter(Stemmer::new(Language::English)) will give unexpected results Yea, I noticed that :D...

> I meant a reference that does the tokenization in `September`, `October` you suggested. That's the `SimpleTokenizer` one. It gives me: ``` september october ```

No it is not. I am sorry, but then I don't understand your question.

So I can guess I can come close to that by somehow reversing the API: ```rust let ngram = NgramTokenizer::all_ngrams(3, 8).unwrap(); let mut text = TextAnalyzer::builder( Stemmer::new(Language::English) .transform(LowerCaser.transform(RemoveLongFilter::limit(40).transform(SimpleTokenizer::default()))) .chain(LowerCaser.transform(RemoveLongFilter::limit(40).transform(SimpleTokenizer::default()))) .chain(LowerCaser.transform(ngram)),...