less icon indicating copy to clipboard operation
less copied to clipboard

UBA support

Open avidseeker opened this issue 1 year ago • 6 comments

Unicode Bidirectional Algorithm describes specifications for the positioning of characters in text containing characters flowing from right to left.

Here's an example text:

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod
tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At
vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren,
no sea takimata sanctus est Lorem ipsum dolor sit amet.

.قِفَا نَبْكِ مِنْ ذِكْرَى حَبِيبٍ ومَنْزِلِ، بِسِقْطِ اللِّوَى بَيْنَ الدَّخُول فَحَوْمَلِ. فَتُوْضِحَ فَالمِقْراةِ لمْ يَعْفُ
رَسْمُها، لِمَا نَسَجَتْهَا مِنْ جَنُوبٍ وشَمْألِ. تَرَى بَعَرَ الأرْآمِ فِي عَرَصَاتِهَا، وَقِيْعَانِهَا كَأنَّهُ حَبُّ
فُلْفُلِ. كَأنِّي غَدَاةَ البَيْنِ يَوْمَ تَحَمَّلُوا، لَدَى سَمُرَاتِ الحَيِّ نَاقِفُ حَنْظَلِ. وُقُوْفًا بِهَا صَحْبِي عَليَّ
مَطِيَّهُمُ، يَقُوْلُوْنَ: لا تَهْلِكْ أَسًى وَتَجَمَّلِ. وإِنَّ شِفائِيَ عَبْرَةٌ مُهْرَاقَةٌ، فَهَلْ عِنْدَ رَسْمٍ دَارِسٍ مِنْ
مُعَوَّلِ؟. كَدَأْبِكَ مِنْ أُمِّ الحُوَيْرِثِ قَبْلَهَا، وَجَارَتِهَا أُمِّ الرَّبَابِ بِمَأْسَلِ. إِذَا قَامَتَا تَضَوَّعَ المِسْكُ
مِنْهُمَا، نَسِيْمَ الصَّبَا جَاءَتْ بِرَيَّا القَرَنْفُلِ. فَفَاضَتْ دُمُوْعُ العَيْنِ مِنِّي صَبَابَةً، عَلَى النَّحْرِ حَتَّى
بَلَّ دَمْعِيَ مِحْمَلِي

and how it should be displayed in less:

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.

.قِفَا نَبْكِ مِنْ ذِكْرَى حَبِيبٍ ومَنْزِلِ، بِسِقْطِ اللِّوَى بَيْنَ الدَّخُول فَحَوْمَلِ. فَتُوْضِحَ فَالمِقْراةِ لمْ يَعْفُ رَسْمُها، لِمَا نَسَجَتْهَا مِنْ جَنُوبٍ وشَمْألِ. تَرَى بَعَرَ الأرْآمِ فِي عَرَصَاتِهَا، وَقِيْعَانِهَا كَأنَّهُ حَبُّ فُلْفُلِ. كَأنِّي غَدَاةَ البَيْنِ يَوْمَ تَحَمَّلُوا، لَدَى سَمُرَاتِ الحَيِّ نَاقِفُ حَنْظَلِ. وُقُوْفًا بِهَا صَحْبِي عَليَّ مَطِيَّهُمُ، يَقُوْلُوْنَ: لا تَهْلِكْ أَسًى وَتَجَمَّلِ. وإِنَّ شِفائِيَ عَبْرَةٌ مُهْرَاقَةٌ، فَهَلْ عِنْدَ رَسْمٍ دَارِسٍ مِنْ مُعَوَّلِ؟. كَدَأْبِكَ مِنْ أُمِّ الحُوَيْرِثِ قَبْلَهَا، وَجَارَتِهَا أُمِّ الرَّبَابِ بِمَأْسَلِ. إِذَا قَامَتَا تَضَوَّعَ المِسْكُ مِنْهُمَا، نَسِيْمَ الصَّبَا جَاءَتْ بِرَيَّا القَرَنْفُلِ. فَفَاضَتْ دُمُوْعُ العَيْنِ مِنِّي صَبَابَةً، عَلَى النَّحْرِ حَتَّى بَلَّ دَمْعِيَ مِحْمَلِي

avidseeker avatar Feb 29 '24 22:02 avidseeker

Does your terminal support UBA? Less is not a layout engine.

polluks avatar Mar 01 '24 16:03 polluks

Yes. I tried it on Tilda, but I don't think it is related to the terminal. The terminal job is rendering whatever CLI program prints to it. Just like in Vim for example, you can paste this then :set rightleft and will change the direction of the text. So my point is the text direction is managed by the program not the terminal.

avidseeker avatar Mar 01 '24 18:03 avidseeker

I don't read Arabic, but from what I can see, the only difference between what less currently displays and your "how it should be displayed" example is that the Arabic text is right-justified in the latter. Other than that, the same characters appear, written in the same direction, in both cases. Is right-justification the only thing you think should be changed? If so, I don't understand why the (quite complex) Unicode Bidirectional Algorithm was mentioned.

gwsw avatar Mar 07 '24 00:03 gwsw

the only difference is right-justification

It's a little bit more complex than that. Unicode has classification for characters to determine whether text should be aligned right or left. Maybe the URL I attached is too technical. Take a look on Unicode Character Property. For example, inserting RTL mark in the beginning of a Latin line, should make that line RTL even though it is written in LTR language, and vice-verse for LTR mark in Arabic text.

Having right-justification is a good start, and would cover most use-cases.

avidseeker avatar Mar 07 '24 01:03 avidseeker

I think I understand the RTL mark stuff, but I'm confused about why the Arabic text already seems to be displayed in the correct order in less (and for that matter, in cat). For example, the first Arabic character in the file is QAF (UTF-8 bytes D9 82). Since less is ignorant of LTR ordering, I would have expected that character to appear as the leftmost character in the first Arabic line, but it appears as the rightmost character. How is that happening?

gwsw avatar Mar 11 '24 17:03 gwsw

You're right. And this is where the terminal comes into play. Terminals with no letter shaping support for Arabic (like Alacritty, st, etc.) actually do the behavior that you describe.

image

But if it does support Arabic, the characters will be shaped correctly starting from the right side.

avidseeker avatar Mar 11 '24 20:03 avidseeker