Python-Text-Analysis
Python-Text-Analysis copied to clipboard
D-Lab's 12 hour introduction to text analysis with Python. Learn how to perform bag-of-words, sentiment analysis, topic modeling, word embeddings, and more, using scikit-learn, NLTK, Gensim, and spaCy...
error: array sizes (30, 30, 50) don't line up
@rbarreto -- remove alpha from all the docs in NMF (notebook on topic modeling) and in solutions notebook too
In lesson 4, the `alpha` argument of the `NMF` function from `sklearn.decomposition` is deprecated. There are now two distinct arguments for the regularization of the $W$ and $H$ matrices.
When working on Day 2, @rbarreto and I see that the binder is taking about 10 minutes to load. This has been an issue for people who are affiliated with...
This issue was present in the first notebook and also extended to the end of the notebook in Challenge #6. This seems to be an environmental issue. Lots of participants...
Right before challenge 5: Applying a lemmatizer to a text, having an issue loading the lemmatizer. The solution here for Renata in the binder was "nltk.download('omw-1.4')" @rbarreto
We should have more time to address the functions that happen in challenge 2. The participants need a bit more time to understand what each function does so they can...
Tokenaziation section has an issue with spacy. Seems like the following works for Mac OS: ! pip install -U spacy [Here's](https://stackoverflow.com/questions/74451907/import-spacy-error-cannot-import-name-dataclass-transform) a link to a stackoverflow issue that has different...
Discrepancy between tokenization and removing stopwords in the binder notebook for lesson 1 in preprocessing. @rbarreto