Hierarchical-Neural-Autoencoder icon indicating copy to clipboard operation
Hierarchical-Neural-Autoencoder copied to clipboard

Questions regarding detecting sentence boundary from paragraph

Open lifelongeek opened this issue 8 years ago • 2 comments

Hi ! Thanks to sharing your work.

I wonder how you divide sentences from paragraphs of 'HotelReview Corpus'.

I guess you divide sentence boundary based on punctuation marks such as (. ! ? ,) But, punctuation marks often makes ambiguity: it can be used as end of the sentence as well as other functions such as abbreviation, continuation etc.

Could you provide any tips how could detect sentence boundary with minimal punctuation marks ambiguity? Specifically, how could you divide sentences by 'comma'?

lifelongeek avatar May 08 '16 06:05 lifelongeek

@gmkim90 have you tried nltk? http://www.nltk.org/_modules/nltk/tokenize/punkt.html / https://www.robincamille.com/2012-02-18-nltk-sentence-tokenizer/ Hop the links might help

staywithme23 avatar Aug 01 '16 20:08 staywithme23

how r u running this code

ghost avatar Dec 29 '18 09:12 ghost