pragmatic_segmenter
pragmatic_segmenter copied to clipboard
Punctuation removed even with clean turned off
See example below, when 'clean' parameter is 'false', the asterisk after cat is still removed
pry(main)> s = "I am a dog. Cat.*"
=> "I am a dog. Cat.*"
pry(main)> ps = PragmaticSegmenter::Segmenter.new(text: s, language: 'en', clean: false)
=> #<PragmaticSegmenter::Segmenter:0x00007fdf5d6890e0
@doc_type=nil,
@language="en",
@language_module=PragmaticSegmenter::Languages::English,
@text="I am a dog. Cat.*">
pry(main)> segments = ps.segment
=> ["I am a dog.", "Cat."]