pragmatic_segmenter icon indicating copy to clipboard operation
pragmatic_segmenter copied to clipboard

Punctuation removed even with clean turned off

Open echan00 opened this issue 5 years ago • 0 comments

See example below, when 'clean' parameter is 'false', the asterisk after cat is still removed

pry(main)> s = "I am a dog. Cat.*"
=> "I am a dog. Cat.*"

pry(main)> ps = PragmaticSegmenter::Segmenter.new(text: s, language: 'en', clean: false)
=> #<PragmaticSegmenter::Segmenter:0x00007fdf5d6890e0
 @doc_type=nil,
 @language="en",
 @language_module=PragmaticSegmenter::Languages::English,
 @text="I am a dog. Cat.*">

pry(main)> segments = ps.segment
=> ["I am a dog.", "Cat."]

echan00 avatar Sep 03 '19 01:09 echan00