pySBD
pySBD copied to clipboard
Arabic sentence split on the Arabic comma
Describe the bug Arabic sentence split on the Arabic comma.
To Reproduce Steps to reproduce the behavior:
import pysbd
text = "هذه تجربة، للغة العربية"
seg = pysbd.Segmenter(language="ar", clean=True)
>>> print(seg.segment(text))
Output: ['هذه تجربة،', 'للغة العربية']
Expected behavior
The text should not be split on the Arabic comma.
Expected output: ['هذه تجربة، للغة العربية']
Additional context
I locally fixed it by modifying the file: pysbd/lang/arabic.py
, deleting ،
from SENTENCE_BOUNDARY_REGEX
.