Florian Leitner

Results 35 comments of Florian Leitner

Hi Felix, thank you for you kind words, glad you like syntok. Well, semantically, there is no easier way to get to these offsets, because they depend on all the...

Maybe I should add that if efficiency does matter to you, you could forego the end offset, entirely, and just generate a list of (start) integers. The end then can...

Correct, I would particularly vouch for the first two functions (with the end offset), because the second two (without) seem almost too trivial to add. The two functions should fit...

Good point; When I developed the first version, segtok, there were no good benchmark datasets for sentence segmentation around that had sufficient coverage of the tricky cases this library can...

The above being said, what I am currently not interested in or would have time to do is go compare my library manually against another, case-by-case. So if someone wants...

Another interesting tool to compare/benchmark against: https://github.com/nipunsadvilkar/pySBD Note that pySBD is supposedly based on the Pragmatic Segmenter.

Thank you, Felix, for bringing this up; A valid feature request: Colon (and semi-colon) handling is indeed a bit of a borderline affair, and technically they are sentence separators. It...

In general, libraries such as nltk and CoreNLP tend to severely over-split, which was the major reason for me to come up with my own. Hence, I agree, adding semicolons...

Release 1.3.1 now supports semi-colon segmentation. I will leave this ticket open, however, as this was specifically about segmenting colons.

As I don't have access to a Windows instance, either I need to remove support, it fixes itself, or someone can help me figure out what is going wrong and...