elvis icon indicating copy to clipboard operation
elvis copied to clipboard

feature-request: wide-character support (UTF-8)

Open Tux opened this issue 9 years ago • 7 comments

Currently, elvis does show multi-byte characters as multiple bytes

» shows as »

it feels like betrayal to use gvim instead of elvis to be able to edit these files

Tux avatar Mar 17 '15 11:03 Tux

Yes, that's probably elvis' biggest shortcoming now. But that's nothing that I can fix. I tried it. But I was not successful. That's probably something Steve could do better than anyone else, but it looks like he currently does not have the time to do so.

mbert avatar Mar 17 '15 19:03 mbert

Elvis will get a second life if somebody could devote a while on it. Lack of UTF support slowly kills our beloved, as people turn to vim consequently...

simplehex avatar Feb 23 '16 23:02 simplehex

Agreed. Just somebody will have to do it.

mbert avatar Feb 24 '16 06:02 mbert

I have recently spent some time with the source code under this perspective and think that the necessary changes will affect larger parts of Elvis. Although now and then characters are already stored in int (which would be sufficient for a UTF-8 encoding), major parts (especially the basic ones) are quite char based.

So what are the possibilities to edit files with UTF-8 encoded contents with Elvis?

You are using a UTF-8 terminal:

You will certainly use a UTF-8 font as well. Start Elvis with set nonascii=all, and there should be no problem to read, input and display any non-ASCII character.

However, because UTF-8 coded characters can be two, three or four bytes long, there is a kind of trailing whitespace with these, because Elvis displays less bytes than received. A screen refresh (^R in input mode, ^L else) corrects the display, but the line will remain longer than the amount of displayed characters. This is annoying but should not have too negative effects.

All reading, input and writing will be in UTF-8 mode.

You are using an ISO terminal:

You were screwed, so far, but there is an experimental patch that allows UTF-8 to ISO conversion - back and forth! (For the moment, ISO 8859-1 only, but if there is interest, I am willing to change this to support other or even all ISO-8859 encodings.)

The nonascii option can be set to convert and will change UTF-8 encoded input to ISO encoding. This will affect all data reading, but only the way the characters are displayed. They remain actually stored with their true UTF-8 value. (Again, there is a kind of trailing whitespace, because Elvis displays less bytes than received; see above.)

Any ISO character input will be converted into UTF-8 encoding (but still displayed in ISO encoding), so that UTF-8 encoded files can be edited without violating their encoding and the users see their typed characters.

Writing will be - unmodified - in UTF-8 mode.

What else could be done?

Recode UTF-8 files to your terminal's ISO encoding, pass the recoded file to Elvis and re-recode back to UTF-8 after saving. Could be automatize, but ugh!

Summary

Unless someone with a lot of time rewrites Elvis, there will be no native, full and true UTF-8 support. But as long as you work with only one ISO character set, it is not completely impossible to edit UTF-8 encoded files with Elvis - in both UTF-8 and ISO terminals.

ib avatar Jul 23 '20 18:07 ib

Is anyone still using Elvis in an ISO 8859 terminal and is interested in the experimental patch mentioned above? (If so, which ISO 8859 encoding?)

ib avatar Jul 24 '20 12:07 ib

Is anyone still using Elvis in an ISO 8859 terminal and is interested in the experimental patch mentioned above? (If so, which ISO 8859 encoding?)

I am certainly not. Thanks also for your investigation on UTF8. I tried to implement UTF8 support several years ago but found that due to my ignorance of termcap programming I found no chance I would ever complete this.

Seems like people really needing to edit unicode files will need to use vim instead.

mbert avatar Jul 24 '20 12:07 mbert

I (almost) never use elvis in a plain terminal environment, but (almost) always as elvis -Gx11 -fork (is my alias for vi) I just accept that my multibyte characters use multiple positions and are not recognizable. It is what it is. Elvis still accepts my Compose UTF-8 characters which then show as junk, but I just know it is valid junk.

Tux avatar Jul 24 '20 13:07 Tux