rust-html2text icon indicating copy to clipboard operation
rust-html2text copied to clipboard

Support RTL

Open NightMachinery opened this issue 6 years ago • 9 comments

Like this one:

Found 1 items, similar to سلام.
-->Moin
-->سلام

<p align=right dir=rtl>(سَ) [<font color="green"> ع.</font> ] (<font color="green">مص ل.</font>)<br><font color="#7030a0">۱-</font> درود گفتن.<br><font color="#7030a0">۲-</font> بی گزند شدن.<br><font color="#7030a0">۳-</font> گردن نهادن.<br>~ علیک درود بر تو باد.<br>~ علیکم درود بر شما.</p>```

NightMachinery avatar Oct 29 '18 13:10 NightMachinery

Hi,

I definitely would like to get RTL text right, but might need some help; I don't know any RTL languages, but am willing to read documentation! If there is a problem (I'm not sure if closing the ticket was a mistake or not), could you provide a complete (though ideally small) example HTML file and roughly how it should come out?

Thanks,

Chris

jugglerchris avatar Oct 29 '18 21:10 jugglerchris

Hi! Thanks! I closed the ticket because I was no longer sure if this had anything to do with your software (as opposed to the terminal itself). I had opened an issue in Kitty, and they said it was the software’s job to render rtl correctly, not the terminal. But I’m being increasingly skeptical, as rtl text doesn’t render correctly even on their prompt. Examples are easy to generate by yourself; Go to any wikipedia page, and change to the language to Arabic, فارسی, Persian, Farsi, or عربی or sth. Chrome (and really any modern browser) will render it correctly without any configuration on your part. Here is a link: https://fa.m.wikipedia.org/wiki/سیب

On Tue, Oct 30, 2018 at 1:28 AM Chris Emerson [email protected] wrote:

Hi,

I definitely would like to get RTL text right, but might need some help; I don't know any RTL languages, but am willing to read documentation! If there is a problem (I'm not sure if closing the ticket was a mistake or not), could you provide a complete (though ideally small) example HTML file and roughly how it should come out?

Thanks,

Chris

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/jugglerchris/rust-html2text/issues/7#issuecomment-434096493, or mute the thread https://github.com/notifications/unsubscribe-auth/Aii--kTBLfSJyLvCPCf90X6Mk_SCn9I6ks5up3oXgaJpZM4X_LR8 .

NightMachinery avatar Oct 30 '18 17:10 NightMachinery

Thanks. I passed that page through html2term and it doesn't look too bad (to me) in pterm/PuTTY:

screenshot_rtl

However (assuming it looks right) it's for the wrong reasons. html2text doesn't currently understand the dir="rtl" attribute, so the table columns are in the wrong order... but (presumably) the bidirectional text algorithm in pterm ends up reversing the entire line. (This is more obvious if the columns weren't exactly the same width - the vertical bar between the columns is offset compared to the top/bottom borders).

I'm not completely sure what the most correct thing for html2text to be doing is - I think it should at least understand dir="rtl" and output some characters for swapping direction explicitly.

jugglerchris avatar Nov 03 '18 21:11 jugglerchris

I'm re-opening but it may take a little while to work out the best way to address this (and it sounds like different terminals behave differently, which won't help).

jugglerchris avatar Nov 04 '18 10:11 jugglerchris

@jugglerchris Can you share a screenshot of the textual content, not the table? I have macOS so I cannot test on pterm. My results with Kitty were unreadable because the word order got reversed.

NightMachinery avatar Nov 04 '18 10:11 NightMachinery

Sure: image

jugglerchris avatar Nov 04 '18 11:11 jugglerchris

Well this is readable and nice. Its worst problem is that it is not rendering half-spaces correctly (it renders می‌شود as میشود), which is perhaps a font problem? Can you see test this on kitty, too, and determine if the issue needs to be solved here or by kitty? I’ll also test it out in Hyper.

On Sun, Nov 4, 2018 at 2:30 PM Chris Emerson [email protected] wrote:

Sure: [image: image] https://user-images.githubusercontent.com/1644842/47963227-ceecc980-e020-11e8-8329-3b756d939cdb.png

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/jugglerchris/rust-html2text/issues/7#issuecomment-435659989, or mute the thread https://github.com/notifications/unsubscribe-auth/Aii--hHzmBCgmwVFbUqIqBn5Z0F7nOZ9ks5ursjTgaJpZM4X_LR8 .

NightMachinery avatar Nov 04 '18 11:11 NightMachinery

I'll try in kitty at some point (but I haven't got it handy at the moment). I've just tried copying your two examples and pasting them into pterm - they both show the same as you say. Does kitty get that part right, and if so what font is it using? I'm a bit concerned that it might be tricky in a terminal with fixed-size character cells, but perhaps it's no different to other combining characters (I looked at the bytes and can see that the difference is a U+200C zero width non joiner).

(Edit: it was zero width non joiner)

jugglerchris avatar Nov 05 '18 22:11 jugglerchris

Although this is still an issue, I'm not currently working on it; currently I don't know of a way to get terminals to do the right thing with mixed LTR/RTL text, particularly in cases like tables, where some rows may have RTL and others LTR (for example the horizontal borders between two rows of cells). I also suspect different terminals behave differently - I'm open to multiple backend implementations if that's what's required.

jugglerchris avatar Mar 29 '19 21:03 jugglerchris