textacy
textacy copied to clipboard
Add more code examples / tutorials
Expected Behavior
Users expect to learn from code examples and tutorials more so than from reading an API reference. We should oblige.
Current Behavior
Fairly brief usage examples are embedded throughout the code in docstrings, but there are few "end-to-end" examples to follow along with.
Possible Solution
Create a separate directory for tutorials, and add more detailed examples (in jupyter notebooks?) there. Create additional rst files to include in the official docs. Examples that have been conveyed to me:
- Generating a terms list (to pass to a
Vectorizer
), using thetextacy.extract
andtextacy.keywords
modules, liketextacy.extract.pos_regex_matches()
andtextacy.keyterms.sgrank
. - How to apply
text_utils.clean_terms()
to a terms list. - How to remove specific terms from a terms list, e.g. custom stop words.
Context
I've gotten more than one email about this... Clearly there's a need.
I wanted to pick up this issue. Besides the three you mentioned is there anything else that I should look out for? Things I should avoid doing or make sure that I do?
Hey @theSage21 , thanks for signing up! 👍
The workflow for topic modeling is mostly standard and well-covered in textacy; it includes file io, preprocessing, spacy parsing, tokenization into terms, vectorization, model training, and visualization of results. This is another good candidate.
Investigating similarities of documents / sentences using metrics in the similarity
module might be interesting, and could also incorporate some of the network
module.
Really, though, I recommend just doing an analysis that's interesting to you, using textacy
. Write it up, and if there are rough spots in terms of usability or gaps in functionality, be sure to let me know! ;)
Awesome library! Just discovered it at work and am about to give it a go! I'm about to train a topic model and would love to post a tutorial here soon. Great stuff!
I'm having an issue with the TF-IDF function, but I think it is possibly that I am using / understanding how to use incorrectly. Have posted here on SO but would be happy to write a usage doc once I understand properly.
https://stackoverflow.com/questions/55764766/calculate-td-idf-for-a-single-word-in-textacy