Jonathan Besomi comments

Results 128 comments of


                                            Jonathan Besomi

Implement Automated Readability Index, Closes #20 ; new PR; Waiting until Checking for NaNs is implemented.

This looks very good! The only question: "Score is NaN if it cannot be computed (e.g. if the number of sentences is 0).", are we testing this scenario there?

Implement Automated Readability Index, Closes #20 ; new PR; Waiting until Checking for NaNs is implemented.

Henri, probably this function will fail this test: #86. What if we put this aside as a draft for a moment, work on the rest, and come back to that...

punctuation not being removed correctly using `preprocessing.clean`

Thank you @henrifroese. @aliforgetti do you have any updates?

📝 Documentation next steps: checklist

Perfect! For the topic modeling tutorial, something similar to this one might be a good start: [discovering-hidden-topics-python](https://www.datacamp.com/community/tutorials/discovering-hidden-topics-python)

How to provide multilingual support

> Just used Texthero for the first time yesterday in Portuguese. Pipeline for preprocessing seems fine, except for stopwords. A solution like @AlfredWGA mentioned would be very much appreciated inside...

How to provide multilingual support

> For Asian languages (Chinese, Japanese...), word segmentation is an essential step in preprocessing. We usually remove non-textual characters in corpus, making them looks like naturally written texts, and segment...

How to provide multilingual support

You can solve it like this: ``` import texthero as hero import pandas as pd s = pd.Series(["is is a stopword"]) custom_set_of_stopwords = ['is'] pipeline = [ lambda s: hero.remove_stopwords(s,...

How to provide multilingual support

Both solutions work; either open an issue or send me an email: jonathanbesomi__AT__gmail.com

How to provide multilingual support

I perfectly agree with what you are proposing; i.e to **permit to remove stopwords from a specific language**. The only big question (and the main purpose of this discussion tab)...

How to provide multilingual support

Hey @AlfredWGA ! Apologize, what do you mean by "integrated"? (Also, remove_punctuation is _integrated_ into remove_stopwords after tokenization) I agree. To probably understand better the problem, we should create a...