free-programming-books
free-programming-books copied to clipboard
Solve some problems with RTL Languages
Here I will talk about some problems with RTL Languages and their solutions. I will explain all the points here, and we can have a discussion about it. And maybe add a section that talking about these problems & solution in Guidelines in CONTRIBUTING
The base discussion on this issue starts on this PR https://github.com/EbookFoundation/free-programming-books/pull/6706 and https://github.com/EbookFoundation/free-programming-books/pull/6715
What is the issue ?
If we have an RTL text here
* [تعلم البرمجة](URL) - Author Name
Note : تعلم البرمجة
means that Learn Programming
It will appear on the website like this:
In this case, we can just dir="rtl"
<div dir="rtl">
* [تعلم البرمجة](URL) - Author Name
</div>
Result:
Is that it ?, No! The monster will show up below 😢
Mixing RTL with LRT languages issue !
The real problem is that when mixing RTL with LRT languages
Case 1
<div dir="rtl">
* [تعلم HTML](URL) - Author Name
</div>
Note : تعلم
means that Learn
.
Result:
Look, he put words in the mixer!
Case 2
If we need to make LTR to go right (both author name and title are LTR)
<div dir="rtl">
* [Learn HTML](URL) - Author Name
</div>
Result:
Both words have been swapped!!
Solution ?
We can solve these two problems with Unicode mark called RLM: https://en.wikipedia.org/wiki/Right-to-left_mark
By adding ‏
after the LRT word that we need to mark it as RTL (it will pretend as RTL word)
Solve case 1
<div dir="rtl">
* [تعلم HTML‏](URL) - Author Name
</div>
Result:
We added ‏
after HTML
Solve case 2
<div dir="rtl">
* [Learn HTML‏](URL) - Author Name
</div>
Result:
You get the point!
Extra Cases!
Case 1
Try to make C#
go to right!
<div dir="rtl">
* C#
* [تعلم لغة C# الرائعة](URL) - إسم المؤلف
</div>
Note: * [تعلم لغة C# الرائعة](URL) - إسم المؤلف
means * [Learn the Cool C# Language] (URL) - Author Name
Result:
The Symbols have the same problem when we try to RTL it And it has the same solution 😉, by LRM Unicode mark: https://en.wikipedia.org/wiki/Left-to-right_mark
<div dir="rtl">
* C#‎
* [تعلم لغة C#‎ الرائعة](URL) - إسم المؤلف
</div>
We use &lrm
not &rlm
, why?
The issue with the symbol is that when we try to add a RTL attribute to C#
to make it get to right
It will render as a RTL word, so the symbol will reorder to the other side.
By adding ‎
after the C#
we mark it as LTR word, so it will render as LTR word
Case 1.1
Both Author Name
and Title
are LTR and end with a symbol as C#
<div dir="rtl">
* [Learn C#](URL) - Author Name
</div>
Result:
The first here will be simple, just put ‏
at the end of the title
<div dir="rtl">
* [Learn C#‏](URL) - Author Name
</div>
Result:
But note that the symbol #
renders as a RTL word, so it will reorder to the other side.
so we must use ‎
after this symbol.
<div dir="rtl">
* [Learn C#‎‏](URL) - Author Name
</div>
Result:
Case 2
If the Title
in English and the Author Name
in Arabic
* [Learn HTML](URL) - إسم المؤلف
Result:
It is enough to make the direction be RTL only without putting any Unicode mark
<div dir="rtl">
* [Learn HTML](URL) - إسم المؤلف
</div>
Result:
Case 3
Sometimes we add some information like (:construction: *in process*)
after the author name
<div dir="rtl">
* [عنوان بالعربي](URL) - Author Name (meta data)
* [Title In LTR‏](URL) - Author Name (meta data)
</div>
Result:
It seems like it is correct, but we read from right to left, so it would be nice if this information was in left to read the author name first then the information
So to solve this, we just put ‏
after the name
<div dir="rtl">
* [عنوان بالعربي](URL) - Author Name‏ (meta data)
* [Title In LTR‏](URL) - Author Name‏ (meta data)
</div>
Result:
if we set a section talking about this solution in Guidelines in CONTRIBUTING (after we finish discussing it here of course)
Other contributors can do the same with their own RTL languages
Thanks for adding this. We can leave it open for a while.
As commented in #6715 if this marks, HTML entity or unicode raw character breaks alphabetize plugin, even worst when are placed at the begining of sentence (the reason: see https://github.com/vhf/remark-lint-alphabetize-lists/blob/ee5f968040acf941c9c4d61fefb2bb1e3b1e8a7b/lib/alphabetical-list-items.js#L5-L14)
From Windows11 charmap.exe
Moreover, non printable version should be used instead of HTML entity. Remember that Markdown markup should be HTML agnostic
@davorpa i can make regex patterns for all these cases It that will help you to detect it automatically or something like that in future ?
@davorpa i can make regex patterns for all these cases It that will help you to detect it automatically or something like that in future ?
Go ahead :wink:. It can be helpful to any maintainer :heart:
@AhmedElTabarani Hello sir, can I work on this?
@AhmedElTabarani Hello sir, can I work on this?
About regex putterns ? Ok no problems at all
I was working on it but i was very busy this weeks.
I was decided to make a JavaScript script to detect all of these and some unit tests to make everything organized
This is last thing I ended up with, maybe it will help you.
Case 0 (It is enough to make a div with dir='rtl')
* [تعلم البرمجة](URL) - Author Name
Regex:
^\* \[[^\w\d\?><;,\{\}\[\]\-_\+=!@\#\$%^&\*\|\']+\]\(.+\) - .+(?<!\(.+\))$
Case 1
* [تعلم HTML](URL) - Author Name
Regex:
^\* \[[\u04c7-\u0591\u05D0-\u05EA\u05F0-\u05F4\u0600-\u06FF-\u0621-\u064A\d\?><;,\{\}\[\]\-_\+=!@\#\$%^&\*\|\' ]+[\w\d]+\]\(.+\) - [\w\ ]+$
Case 2
* [Learn HTML](URL) - Author Name
Regex:
^\* \[[^\u04c7-\u0591\u05D0-\u05EA\u05F0-\u05F4\u0600-\u06FF-\u0621-\u064A]+[\w\d]\]\(.+\) - [\w\ ]+$
Extra Case 1
* C#
* [تعلم لغة C# الرائعة](URL) - إسم المؤلف
Extra Case 1.1
* [Learn C#](URL) - Author Name
Extra Case 2 (It is enough to make a div with dir='rtl')
* [Learn HTML](URL) - إسم المؤلف
Extra case 3
* [عنوان بالعربي](URL) - Author Name (meta data)
* [Title In LTR‏](URL) - Author Name (meta data)
The main RTL languages are Arabic, Persian and Hebrew... which are only 3 out of all the languages translated on this repo... might be better to have a special section for these languages... as it is not relevant for all the LTR ones.
Have you tried the following?
-
Update the CONTRIBUTING.md file to include a section for RTL languages, explaining the issues, solutions, and usage of Unicode marks (RLM and LRM) for different cases.
-
Create a separate section or a separate file specifically for Arabic, Persian, and Hebrew languages in the repository, as @avipars suggested. This would help maintain a better organization for RTL languages and make it easier to manage content for these languages separately.
some good ideas in this issue. Would welcome a PR.
does this issue still needs to be fixed
Can i work on this issue. Thankyou...