texthero Add correct

Or at least check how many mistakes in a sentence.

See: https://pypi.org/project/pyenchant/

May 08 '20 16:05 jbesomi

@jbesomi I have check the library. We can create a function count_mistakes which can return number of mistakes per sentence.

For correcting mistakes, the library has a method suggest(word) which returns list of suggestions for the given word. We can have a method correct_mistakes that by default chooses the first word in the suggestions and replace the incorrect word with it? Do you have another suggestion for this?

May 22 '20 21:05 selimelawwa

Good idea. *return number of mistakes per pandas Series-row.

May 22 '20 21:05 jbesomi

Ok but what about correct mistakes?

May 22 '20 22:05 selimelawwa

As you proposed is fine. Only thing, before going with pyenchant, would be great to select 2/3 similar package, test and rank them and finally implement count_mistakes and correct_mistakes.

May 22 '20 22:05 jbesomi

Hi, I checked and these are the alternative options:

symspellpy which is a python port to SymSpell
spacy_hunspell
pyspellchecker

These sources claim SymSpell should be the best in terms of performance (time):

With SymSpell We can implement automatic_correct_mistakes but will be a bit more complicated than PyEnchant.

Please check and let me know your opinion.

May 24 '20 18:05 selimelawwa

Great. Both sources do not cite and do not benchmark pyenchant. Probably, we should test ourself both pyenchant and symspellpy both for quality of results and execution time and pick the best. In the end, we might decide to pick both and let the user decide. In this case, we would need anyways a benchmarking to understand which ones work best in which situation. What's your opinion Selim?

May 25 '20 12:05 jbesomi

Sorry for late reply, We had holidays here in Egypt after Ramadan. Yeah I think we should test both too to be able to determine ourselves which is better and for which use case. However how do you suggest testing for the quality on of result for large data? I will start on them from tomorrow, keep you updated

Jun 01 '20 22:06 selimelawwa

No problem; thank you for your help! For the performance comparison, just pick a large NLP dataset and compare the execution time. For quality, I guess you need to look at the results yourself and decide.

Jun 02 '20 05:06 jbesomi

texthero
texthero copied to clipboard

Add correct_mistakes(s)

texthero texthero copied to clipboard

Add correct_mistakes(s)

texthero
texthero copied to clipboard