pyarabic
pyarabic copied to clipboard
Convert Arabic glyphs into standard letters
According to previous issue issue 57, we propose to add a new function to unshape this text
Salam, I tested the given words with pyarabic word as follow, the word contains encoded glyphs not standard letters, it must be converted to ordinary letters.
To convert glyph based word into a string of letters you can use: NB: the second unshape function is used only to inverse the result word
word = "ﻣﺴﺎﻣﻌﻬﻢ"
from pyarabic.unshape import unshaping_word
unshaping_word(unshaping_word(word))
'مسامعهم'
- The test used to detect the problem
``>>> import pyarabic.araby as ar
lst=["اﻟﻤﺴﺌﻮﻟﻴﺔ","ﻣﺴﺎﻣﻌﻬﻢ","ﻓﻜﻠﻨﺎ","ﻣﺒﺎدراﺗﻨﺎ","ﻓﻬﻢ","اﻟﻤﻨﻈﻮﻣﺔ"] for i in lst: ... print(i, ar.is_arabicword(i)) ... اﻟﻤﺴﺌﻮﻟﻴﺔ False ﻣﺴﺎﻣﻌﻬﻢ False ﻓﻜﻠﻨﺎ False ﻣﺒﺎدراﺗﻨﺎ False ﻓﻬﻢ False اﻟﻤﻨﻈﻮﻣﺔ False
for i in lst: ... print("%s"%i, ar.is_arabicword(i)) ... اﻟﻤﺴﺌﻮﻟﻴﺔ False ﻣﺴﺎﻣﻌﻬﻢ False ﻓﻜﻠﻨﺎ False ﻣﺒﺎدراﺗﻨﺎ False ﻓﻬﻢ False اﻟﻤﻨﻈﻮﻣﺔ False for i in lst: ... for c in i : ... print(c, ord(c), ar.name(c)) ... ا 1575 ألف ﻟ 65247 ﻤ 65252 ﺴ 65204 ﺌ 65164 ﻮ 65262 ﻟ 65247 ﻴ 65268 ﺔ 65172 ﻣ 65251 ﺴ 65204 ﺎ 65166 ﻣ 65251 ﻌ 65228 ﻬ 65260 ﻢ 65250 ﻓ 65235 ﻜ 65244 ﻠ 65248 ﻨ 65256 ﺎ 65166 ﻣ 65251 ﺒ 65170 ﺎ 65166 د 1583 دال ر 1585 راء ا 1575 ألف ﺗ 65175 ﻨ 65256 ﺎ 65166 ﻓ 65235 ﻬ 65260 ﻢ 65250 ا 1575 ألف ﻟ 65247 ﻤ 65252 ﻨ 65256 ﻈ 65224 ﻮ 65262 ﻣ 65251 ﺔ 65172 `