alreq
alreq copied to clipboard
Hyphenation in Arabic script writing systems
I think we need to get to Hyphenation in a separate section, besides the justification matters. The most important question to answer in this section would be when it's okay to break the line in the middle of a word, and if so, how.
https://www.tug.org/tugboat/tb27-2/tb87benatia.pdf
Benatia, Mohamed Jamal Eddine, Mohamed Elyaakoubi, and Azzeddine Lazrek. "Arabic text justification." TUGboat 27.2 (2006): 137-146.
Looks like Adobe Illustrator is trying to provide options for hyphenation, but the UI actually doesn't make much sense for Arabic text, so I assume they apply the same Latin logic to Arabic text.
https://helpx.adobe.com/illustrator/using/arabic-hebrew.html
From https://www.w3.org/TR/css-text-3/#hyphens-property
When shaping scripts such as Arabic are allowed to break within words due to ‘break-all’, the characters must still be shaped as if the word were not broken.
Also:
@r12a, @ntounsi, do you know/remember what was the source for this decision? (I'm trying to gather all sources used existing practices.)
From http://unicode.org/reports/tr14/
Unicode® Standard Annex #14 UNICODE LINE BREAKING ALGORITHM
Hyphenation, and therefore the SHY, can be used with the Arabic script. If the rendering system breaks at that point, the display—including shaping—should be what is appropriate for the given language. For example, sometimes a hyphen-like mark is placed on the end of the line. This mark looks like a kashida, but is not connected to the letter preceding it. Instead, the appearance of the mark is as if it had been placed—and the line divided—after the contextual shapes for the line have been determined. For more information on shaping, see [UAX9] and Section 9.2, Arabic, of [Unicode].
I'm guessing this was the source for the css-text-3 decision. What do you think?
From https://drafts.csswg.org/css-text/
I looked at a bunch of Persian newspapers from this week, couldn't find a single instance of Hyphenation. My guess is that they all turn it off because of law quality of the existing digital solutions used for typesetting.
From "ketaab-e jom'e" from early 1980s:
Sample 1: "دیوان - سالاری" across lines.
Sample 2: "ویژه - نامهها" and "حتّی - المقدور" across lines, consecutive.
Sample 3: "می - کنند" across pages.
Sample 4: "رودر - رویی" across columns.
Notes:
- All samples show inter-joining-segment hyphenation.
- I couldn't find any instance of intra-joining-segment hyphenation in these publications.
My understanding is that the only modern Arabic orthography allowing hyphenation is Uyghur Ereb Yëziqi (the modern Arabic based orthography, the old, also Arabic based, orthography did not allow hyphenation), whose behavior is what Unicode and CSS are describing. I’ve been told that at some point (in the 80s?) Persian publications did use hyphenation and, IIRC, it was only allowed at ZWNJ.
Arabic language AFAIK never had hyphenation, even in the early stages of the orthography when breaking inside words was allowed it didn’t use a hyphen when breaking words and the breaking would only happen between unjoined letters (i.e. only after right joining letters) and never between joined ones.

In the second sura (middle left of page), lines 4/5 السمو / ت, lines 7/8 ا / لحسنى, etc.

lines 1/2 و / احدة, lines 2/3 ر / تلنه, etc.
@r12a, @ntounsi, do you know/remember what was the source for this decision? (I'm trying to gather all sources used existing practices.)
@fantasai is the person to ask.
For Persian...
Well, we have plenty of evidence of hyphenation, at least in Movable Type sources, starting from 1970's, and possibly earlier.
Also, I remember seeing it in more recent publications, some computer typeset, but in extreme situations thought, like very narrow columns.
Also, I remember being taught about it in elementary school (4th grade, IIRC), specially as a writing practice. It was not in the books, AFAIR, but the teacher would teach you in the class. Although, I don't remember if the teacher asking us to break the word at segment boundary, but have a fuzzy memory of being taught to break it as syllable boundary.
I looked at many of the language and writing 1-12 books today hoping to find some mention of hyphenation. No luck.
Based on these, I think it's better to document it, as a last resort solution for some languages, including Persian, with explanation of both inter-segments and inter-syllable methods.
@shervinafshar, @mostafah, what do you think? Do you have better material on this? Maybe in Adib-Soltani book? (I don't have my copy here...)
Agree with Khalid about Arabic language. I've never seen hyphenation.
However, and regardless of hyphenation, a situation where a word can be cut in two, is found in poetry between the two half-lines of the same verse. The breaking doesn't always happen at a non joining boundary.

Very good point, @ntounsi! I remember we talked about this case once. I'll include this into the Joining section. (#97)
Now, a question would be, should we categorize this behavior under Hyphenation? Or maybe under Justification? (#57)
And, what would you call this behavior in Arabic? Any specific terms you use?
And, what would you call this behavior in Arabic? Any specific terms you use?
The Arabic term is التدوير (al-ttadwīr). Wikipedia.
Uyghur hyphenates. http://fantasai.inkedblade.net/style/scans/LoC025.png This is why it was added.
From http://fantasai.inkedblade.net/style/scans/LoC025.png
Features:
- Intra-segment line break with hyphenation.
- Inter-segment line break with hyphenation.
- Plenty of ZWNJ.
Hyphenation Character
We also need to note that the character used for hyphenation (CSS' hyphenate-character) is commonly expected to sit on the baseline, similar to TATWEEL, but non-joining by itself.
Possible default characters are:
- U+002D HYPHEN-MINUS
- U+2010 HYPHEN
Preferred character probably depends on their existence in the font in use. If U+2010 is available, it's more trusted to have the right shape. If not, falling back to U+002D is one option, another being TATWEEL.
No matter which character is used, there needs to be some space between the hyphen and previous letter (like a narrow-space), whether the letter is in join-on-left form or not.