pySBD icon indicating copy to clipboard operation
pySBD copied to clipboard

🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.

Results 33 pySBD issues
Sort by recently updated
recently updated
newest added

**Describe the bug** Instances of `--` often break the segmentation. When these are replaced, segmenting the same sentence works as expected. **To Reproduce** Breaking Examples: 1. ``` "Volumes within all...

🌐 Added Support for the Bangla Language This pull request introduces support for the Bangla language within the project. Given the linguistic similarity between Bangla and Hindi, where both languages...

**Describe the bug** A text containing a particular combination of single quotes doesn't get segmented. **To Reproduce** Steps to reproduce the behavior: Input text - Come work for us in...

When dealing with a long statement of facts quoted from legal text, the text is not split up within left double quotations and write double quotations. this is different than...

**Describe the bug** Control characters like `\x1f` break German sentence segmentation at `format_numbered_list_with_periods` step. **To Reproduce** Steps to reproduce the behavior: Input text - `'1.\x1f\x1fApfel\x1d2.\x1f\x1fBanana'` Code: ``` import pysbd example_text...

**Describe the bug** When an open parenthesis appears in certain situations in German text, it can cause a crash when running sentence splitting. **To Reproduce** from pysbd import Segmenter text...

**Describe the bug** A clear and concise description of what the bug is. **To Reproduce** input_str = """This is part 3 of MAMI-san's hair timelineThe previous hair timelines can be...

- Based on the Slovak language that is very close to Czech one I have created initial support for Czech language sentence splitting

Bumps [nltk](https://github.com/nltk/nltk) from 3.5 to 3.9. Changelog Sourced from nltk's changelog. Version 3.9.1 2024-08-19 Fixed bug that prevented wordnet from loading Version 3.9 2024-08-18 Avoid need for pickled models, resolves...

dependencies

German texts often use a pair of `„` and to `“,` to delineate quoted text. These cause issues for example in the below text: `Nach einem kurzen Zögern näherte sie...