textacy Add more code examples / tutorials

Expected Behavior

Users expect to learn from code examples and tutorials more so than from reading an API reference. We should oblige.

Current Behavior

Fairly brief usage examples are embedded throughout the code in docstrings, but there are few "end-to-end" examples to follow along with.

Possible Solution

Create a separate directory for tutorials, and add more detailed examples (in jupyter notebooks?) there. Create additional rst files to include in the official docs. Examples that have been conveyed to me:

Generating a terms list (to pass to a Vectorizer), using the textacy.extract and textacy.keywords modules, like textacy.extract.pos_regex_matches() and textacy.keyterms.sgrank.
How to apply text_utils.clean_terms() to a terms list.
How to remove specific terms from a terms list, e.g. custom stop words.

Context

I've gotten more than one email about this... Clearly there's a need.

Jul 02 '17 15:07 bdewilde

I wanted to pick up this issue. Besides the three you mentioned is there anything else that I should look out for? Things I should avoid doing or make sure that I do?

Mar 01 '18 14:03 theSage21

Hey @theSage21 , thanks for signing up! 👍

The workflow for topic modeling is mostly standard and well-covered in textacy; it includes file io, preprocessing, spacy parsing, tokenization into terms, vectorization, model training, and visualization of results. This is another good candidate.

Investigating similarities of documents / sentences using metrics in the similarity module might be interesting, and could also incorporate some of the network module.

Really, though, I recommend just doing an analysis that's interesting to you, using textacy. Write it up, and if there are rough spots in terms of usability or gaps in functionality, be sure to let me know! ;)

Mar 01 '18 14:03 bdewilde

Awesome library! Just discovered it at work and am about to give it a go! I'm about to train a topic model and would love to post a tutorial here soon. Great stuff!

Jun 19 '18 15:06 tmthyjames

I'm having an issue with the TF-IDF function, but I think it is possibly that I am using / understanding how to use incorrectly. Have posted here on SO but would be happy to write a usage doc once I understand properly.

https://stackoverflow.com/questions/55764766/calculate-td-idf-for-a-single-word-in-textacy

Apr 20 '19 06:04 scarroll32