David Smiley
David Smiley
I think that if some other component of a larger pipeline wanted to do segmentation, perhaps because it's more sophisticated, then the feature I propose here simply wouldn't be used....
I'm sure "this can happen"; I could add a trivial test. Nice trick on the sentence alignment. What could be useful is an additional TokenFilter that recognizes a large jump...
Great suggestion @simonatdrg ; LOL its much simpler than my idea of the TokenFilter :-)
I suggest simply removing the `` if there is any doubt. It's debatable; there is no actual OpenSextant _organization_. RE Copyright: It's not clear to me how to handle this....
So sorry I didn't respond sooner. My GitHub notification settings needed to be updated. DevNotes.txt is perhaps something I shouldn't have committed; ignore it. Though it does have a curl...
BTW I've noticed your comment http://sujitpal.blogspot.com/2014/02/fuzzy-string-matching-with.html -- I thought your name was familiar.
@jprochaz See #23 (no change) I forgot wether or not 1.x branch supports Solr 4.7 or not but your comment says it doesn't. It's probably something really minor like default...
Very cool! After we switch to Java 8 (which I see this code requires as it uses .stream()), what do you think of providing this code with the text tagger?...
I just released 2.3 and bumped the tagger to Solr 6 and Java 8. Yes, Solr 6 requires Java 8. Feel free to send a pull request for this contribution....
Now that Solr 7.4.0 includes the tagger, perhaps you might want to propose your addition directly to Apache Solr / SolrJ.