wiktra icon indicating copy to clipboard operation
wiktra copied to clipboard

Transliteration from Arabic not working for continuous text

Open twardoch opened this issue 4 years ago • 4 comments

Transliteration from Arabic is not working for continuous text. It works for single space-separated characters. The Arabic Wiktionary module is a bit complex, need to investigate and add some special processing.

twardoch avatar Aug 11 '21 02:08 twardoch

Should I implement preprocessing and postprocessing functions in this case? It is like tokenizing continuous text in preprocessing and concat the transliteration results in postprocessing.

kbatsuren avatar Aug 12 '21 15:08 kbatsuren

I think it’d be best to find out WHY it’s happening. There are multiple modules:

  • https://en.wiktionary.org/wiki/Module:ar-translit
  • https://en.wiktionary.org/wiki/Module:ks-Arab-translit
  • https://en.wiktionary.org/wiki/Module:fa-translit
  • https://en.wiktionary.org/wiki/Module:pa-Arab-translit

ar-translit has an unusual tr function: function export.tr(text, lang, sc, omit_i3raab, gray_i3raab, force_translit).

I could try to find out how to deal with this, or you might :)

We ought

twardoch avatar Aug 20 '21 20:08 twardoch

I would add that when the language is set as fas (Persian), even single letters are not transliterated.

skalyan91 avatar Aug 29 '21 09:08 skalyan91

Yeah, there are a few different Arabic-script transliterators and the whole notion of Arabic needs some special handling in our Py code.

twardoch avatar Aug 29 '21 22:08 twardoch