Maha icon indicating copy to clipboard operation
Maha copied to clipboard

Change the default values of :func:`~.reduce_repeated_substring` parameters for general use

Open mohamadmansourX opened this issue 2 years ago • 2 comments

Since this Repo is meant to be focusing on Arabic language, I'd like to suggest to change the minimum repeated character filter from 3 to 4 since in Arabic language 3 consecutive occurrences of the same character do exists.

(Note even with the use of Shadda, there will still exist three consecutive occurrences)

Check the below two examples:

mohamadmansourX avatar Sep 20 '21 13:09 mohamadmansourX

@mohamadmansourX

Thanks for contributing to Maha.

  1. You are right, this library focuses mainly on the Arabic language and we can have words like these (تتتابع,تشتتت, تَشَتَّتَت).

I support that to keep like these words without changing.

  1. You have failed tests on your pull request, please follow the guidelines for contribution as stated in the documentation Contributing.

Thanks again for your contribution!

mohammad-albarham avatar Sep 20 '21 20:09 mohammad-albarham

Thank you for pointing this out. I am in favor of changing the default value of min_repeated and reduce_to to 4 and 3 respectively as it will allow for general use. However, this should not take place without notifying the community that the default behavior has changed. For now, I would suggest to leave this PR open for a while so that more from the community can give their opinion on this change.

In the meanwhile, please fix the tests.

TRoboto avatar Sep 21 '21 11:09 TRoboto