pytextpreprocess
pytextpreprocess copied to clipboard

turian

→

Metadata

Preprocess text for NLP (tokenizing, lowercasing, stemming, sentence splitting, etc.)

Readme
Issues

pytextpreprocess

written by Joseph Turian released under a BSD license

Preprocess text for NLP (tokenizing, lowercasing, stemming, sentence splitting, etc.)

REQUIREMENTS: * My Python common library: http://github.com/turian/common and sub-requirements thereof. * NLTK, for word tokenization e.g. apt-get install python-nltk

* Splitta if you want to sentence tokenize

The English stoplist is from: http://jmlr.csail.mit.edu/papers/volume5/lewis04a/a11-smart-stop-list/english.stop However, I added words at the top (above "a").

About

Preprocess text for NLP (tokenizing, lowercasing, stemming, sentence splitting, etc.)

29

Stars

17

Forks

Watchers

Owner

turian

← Metadata

29

Stars

17

Forks

Watchers

Owner

turian

Metadata

Preprocess text for NLP (tokenizing, lowercasing, stemming, sentence splitting, etc.)

Back

pytextpreprocess pytextpreprocess copied to clipboard

Metadata

pytextpreprocess

← Metadata

Owner

Metadata

pytextpreprocess
pytextpreprocess copied to clipboard