pragmatic_segmenter
pragmatic_segmenter copied to clipboard
French 3 petit point is not handle.
HI, In french we have a ... at the end of sentence but here it doesn't segment right I think it's because etc is also an abreviation that is written etc.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<wrapper>
<s>J'aime le sport etc..</s>
<s>. Cependant est ce vrai ?</s>
</wrapper>
Should look like this :
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<wrapper>
<s>J'aime le sport etc...</s>
<s>Cependant est ce vrai ?</s>
</wrapper>
Thanks for your time