Maha icon indicating copy to clipboard operation
Maha copied to clipboard

Add the option to ignore Harakat when removing or replacing

Open xaleel opened this issue 1 year ago • 1 comments

What problem are you trying to solve?

Currently, the cleaner functions do not consider two strings similar if they have different Harakat/diacritics, which is the correct behavior. However, it would be great if the user had the option to ignore Harakat when comparing strings.

Examples (if relevant)

Current:

>> from maha.cleaners.functions import remove
>> output = remove("يُدَرِّسُ اللُّغَةَ العَرَبِيَّةَ الفُصْحَى", custom_expressions=r"اللغة")
>> output
يُدَرِّسُ اللُّغَةَ العَرَبِيَّةَ الفُصْحَى

Suggested:

>> from maha.cleaners.functions import remove
>> remove("يُدَرِّسُ اللُّغَةَ العَرَبِيَّةَ الفُصْحَى", custom_expressions=r"اللغة", ignore_harakat=True)
>> output
يُدَرِّسُ العَرَبِيَّةَ الفُصْحَى

Definition of Done

  • It must adhere to the coding style used in the defined cleaner functions.
  • The implementation should cover most use cases.
  • Adding tests

xaleel avatar Aug 05 '22 17:08 xaleel

I like this feature and I believe it can be extended to contains and replace functions. If you want to work on the implementation, make sure to illustrate exactly when will it be used and add an example for a use case.

TRoboto avatar Aug 05 '22 18:08 TRoboto