Hierarchical-Neural-Autoencoder
Hierarchical-Neural-Autoencoder copied to clipboard
Questions regarding detecting sentence boundary from paragraph
Hi ! Thanks to sharing your work.
I wonder how you divide sentences from paragraphs of 'HotelReview Corpus'.
I guess you divide sentence boundary based on punctuation marks such as (. ! ? ,) But, punctuation marks often makes ambiguity: it can be used as end of the sentence as well as other functions such as abbreviation, continuation etc.
Could you provide any tips how could detect sentence boundary with minimal punctuation marks ambiguity? Specifically, how could you divide sentences by 'comma'?
@gmkim90 have you tried nltk? http://www.nltk.org/_modules/nltk/tokenize/punkt.html / https://www.robincamille.com/2012-02-18-nltk-sentence-tokenizer/ Hop the links might help
how r u running this code