languagetool icon indicating copy to clipboard operation
languagetool copied to clipboard

Suggestion match with regexp_match not working when using tashkeel

Open sohaibafifi opened this issue 5 years ago • 8 comments

Example :

<rule id="word_use_0005_muta2akid" name="متأكد">
        <pattern>
          <token inflected="yes">متأكد</token>
        </pattern>
        <message>يفضل أن يقال:
        <suggestion><match no="1" regexp_match="متأكد" regexp_replace="متحقِّق"/></suggestion>
        <suggestion><match no="1" regexp_match="متأكد" regexp_replace="متيقِّن"/></suggestion>
متيقن أو متحقق بدلا من متأكد</message>
        <example correction="متحقِّق|متيقِّن" type="incorrect"> هل أنت <marker>متأكد</marker>؟</example>
        <!--  Wrong: هل أنتَ متأكِّد؟ -->
        <!--Correct: هل أنتَ متيقِّن؟ / هل أنتَ متحقق؟ -->
</rule>

with the sentence :

هل أنتَ مُتأكِّد أنّنا نسير في الاتّجاه الصّحيح؟

output:

1.) Line 1, column 9, Rule ID: word_use_0005_muta2akid[4]
Message: يفضل أن يقال:
        'مُتأكِّد'
        'مُتأكِّد'
متيقن أو متحقق بدلا من متأكد
Suggestion: مُتأكِّد
Rule source: /org/languagetool/rules/ar/grammar.xml
هل أنتَ مُتأكِّد أنّنا نسير في الاتّجاه الصّحيح؟
        ^^^^^^^^                                

The problem: regexp_match is not matching if the word contains tashkeel.

sohaibafifi avatar Jun 12 '20 15:06 sohaibafifi

I suggest to re- program the "case_conversion" attribute. to handle tashkeel strip or ignoring

https://dev.languagetool.org/tips-and-tricks#changing-the-case-of-matched-word

linuxscout avatar Mar 03 '21 19:03 linuxscout

I found a way to do this, I make some changes on code, can you update the repository from upstream, in order to make a PR for this change thanks

linuxscout avatar Mar 04 '21 18:03 linuxscout

The commit,

https://github.com/linuxscout/languagetool/commit/8d0f2ea46a83333c478d6b7be12c2c2cf3812949

linuxscout avatar Mar 04 '21 18:03 linuxscout

@linuxscout My repo is synched now with upstream

sohaibafifi avatar Mar 05 '21 10:03 sohaibafifi

To be closed

linuxscout avatar Mar 07 '21 11:03 linuxscout

Should I include the removeTashkeel method?

sohaibafifi avatar Mar 08 '21 19:03 sohaibafifi

I tried to add it, take a look on the PR, I updated core-files. Perhaps there is a way to includes changes only on arabic module

linuxscout avatar Mar 08 '21 20:03 linuxscout

To be closed

linuxscout avatar Feb 28 '22 10:02 linuxscout