Maha
Maha copied to clipboard
Add the option to ignore Harakat when removing or replacing
What problem are you trying to solve?
Currently, the cleaner functions do not consider two strings similar if they have different Harakat/diacritics, which is the correct behavior. However, it would be great if the user had the option to ignore Harakat when comparing strings.
Examples (if relevant)
Current:
>> from maha.cleaners.functions import remove
>> output = remove("يُدَرِّسُ اللُّغَةَ العَرَبِيَّةَ الفُصْحَى", custom_expressions=r"اللغة")
>> output
يُدَرِّسُ اللُّغَةَ العَرَبِيَّةَ الفُصْحَى
Suggested:
>> from maha.cleaners.functions import remove
>> remove("يُدَرِّسُ اللُّغَةَ العَرَبِيَّةَ الفُصْحَى", custom_expressions=r"اللغة", ignore_harakat=True)
>> output
يُدَرِّسُ العَرَبِيَّةَ الفُصْحَى
Definition of Done
- It must adhere to the coding style used in the defined cleaner functions.
- The implementation should cover most use cases.
- Adding tests
I like this feature and I believe it can be extended to contains
and replace
functions. If you want to work on the implementation, make sure to illustrate exactly when will it be used and add an example for a use case.