pragmatic_segmenter
pragmatic_segmenter copied to clipboard
French à after abbreviation / Min. abbreviations
I noticed that à
seems not to be detected as a lower-case letter after an abbreviation:
assert_equal 1, segment("85,7 cm (33 3/4 po) min. à 88,9 cm (35 po) max.", 'fr').size # Fails
Digging into this it seems that there are two separate issues: min
seems to be missing as an abbreviation in French (it might be common in many languages), but also the à
doesn't work in english:
assert_equal 1, segment("33-3/4” (85.7 cm) min. to 35” (88.9 cm) max.", 'en').size # Works
assert_equal 1, segment("33-3/4” (85.7 cm) min. to 35” (88.9 cm) max.", 'fr').size # Fails
assert_equal 1, segment("85,7 cm (33 3/4 po) min. à 88,9 cm (35 po) max.", 'en').size # Fails
As a test for the test suite:
it "French à after abbreviation" do
sentence = "85,7 cm (33 3/4 po) min. à 88,9 cm (35 po) max."
ps = PragmaticSegmenter::Segmenter.new(text: sentence, language: "fr")
expect(ps.segment).to eq([sentence])
end