far2l icon indicating copy to clipboard operation
far2l copied to clipboard

No 1251 encoding and no non-English encodings supported.

Open faerot opened this issue 2 years ago • 12 comments

In Windows Far 1251 encoding was the default in editor. I get that Linux has different encoding, but considering that Far is originally Windows program a lot of users are coming from Windows and have a lot of text files in non-English windows encodings (especially as comments in source files). It would be helpful to have at least have 1251 alongside KOI-8. Ideally it would be great to have full specter of encodings like in Far 3.

faerot avatar Nov 20 '23 15:11 faerot

For new edit file the default encoding can be change in F9->Options->Editor settings. By default now UTF-8.

For opened in viewer or editor file you can choose codepage by Shift-F8 or quick change by F8 only codepages UTF->ANSI->OEM Switchable by F8 and Shift-F8 OEM and ANSI code pages are defined based on the text file ~/.config/far2l/cp (first line is OEM, second is ANSI) or, if its absence, by environment variable LC_CTYPE.

Some details in Russian in https://github.com/akruphi/far2l/wiki#cp

akruphi avatar Nov 20 '23 17:11 akruphi

In Windows Far 1251 encoding was the default in editor. I get that Linux has different encoding, but considering that Far is originally Windows program a lot of users are coming from Windows and have a lot of text files in non-English windows encodings (especially as comments in source files). It would be helpful to have at least have 1251 alongside KOI-8. Ideally it would be great to have full specter of encodings like in Far 3.

First of all FAR3 is completely separate (Windows) program, different from FAR2L

At second, you are free to add full spectre of encodings by yourself if you say so, this is not an issue for Ubuntu users, which uses Unicode based encodings and do no stuck with OLD, really OLD, i mean MS-DOS stuff. All problems with encodings came from the fact that you have 7 bits on VGA screen, low memory on a system and BIOS which is luck of anything nowadays is a standard.

In Nowell Netware, DOS, MS-DOS 4 we have to use special encodings just because we cant show you special characters on a screen, that's all. And any character not fit in 7-bit table of 26 characters (latin characters and couple of pseudocode chars) treated automaticaly as ? symbol or unknown.

One of the genius discovered if you need to read text on you own language you can use MS-DOS TSR program wich will intercept a direct call to writes into videomemory and keyboard inputs and use part of ASCII table for local characters to show you a "fake" data, not the real ones so you would be probably happy reading Azimov sci-fi from a screen on your own KOI-8R russian symbols, or WINDOWS-1251 symbols, if you lucky enougth to install Windows 3.11a on your system with Russian "keyboard layout" support.

The main idea of fake data, which is not exist anywhere except UPPER segment of memory using you memory driver HIMEM.SYS in your system even before window manager will erase it with it's own implementation of ASCII character encoder/decoder to support other languages was completely wrong. Imagine that you still has bytes in input file, it is just a 0xFA 0xFF codes, but interpretation of these binary symbols and so appearing on the screen depends on a system and interpretation of these characters, seems to be absolutely the must. Indeed you could not probably read Unicode documents on DOS, just because you have about 43 KB conventional memory and 20-30KB of video memory. Probably you would even not fit all Unicode character table in 4K of memory to represent this without highly accessible RAM, which was also SLOW, i mean really SLOW, not as slow, as VIDEO memory access (haha), but anyway.

So, in my point of view, there is no point to turn back to the old school approach and support character maps are not supported directly by the operation system. As long as ALL OSes except Windows (still) switched to Unicode encoding internally, Windows has to fight with the ghosts still with it's own fight with shadows of the past, i mean still, UTF8, UTF16, UTF32, UTF-16 with or without BOM, all that crazy things still exists, when you simply try to save Visual Studio Solution file, it argues that despite all files are UTF-8, there are LITTLE problem saving to UTF-8 WITH BOM, you know. Why the hell me as user should care about the BOM/BOM-less file formats? Just do it. It is your OS, your compiler, your files.

That is why support of many many many file encodings with special bells and whistles on top of it I consider not worth to be on unless you need this text encodings.

All text matters then. Probably someone will import this to be able to read this mythical KOI8-r text on BBS, of cause.

default-writer avatar Nov 23 '23 02:11 default-writer

https://ru.wikipedia.org/wiki/%D0%A7%D0%B5%D1%80%D0%BD%D0%BE%D0%B2,%D0%90%D0%BD%D0%B4%D1%80%D0%B5%D0%B9%D0%90%D0%BB%D0%B5%D0%BA%D1%81%D0%B0%D0%BD%D0%B4%D1%80%D0%BE%D0%B2%D0%B8%D1%87

default-writer avatar Nov 23 '23 02:11 default-writer

https://datatracker.ietf.org/doc/html/rfc1489

default-writer avatar Nov 23 '23 02:11 default-writer

General problem with encodings:

What you see is not what you get.

Unless you are using Unicode (except for the variable lengths character encoded data), every character you see on a screen is not mapped 1:1 to the data in the source file, and that is why it is called encodings and code pages. This is just a mysterious way to decode somehow encoded data. This is like satanist action on a KOI-8r data, making it readable on a system do not supporting Cyrillic code pages.

default-writer avatar Nov 23 '23 02:11 default-writer

Occultists and the dark lord magic still here, no worries:

image

default-writer avatar Nov 23 '23 02:11 default-writer

http://dgmag.in/N44/DowngradeN44.pdf

default-writer avatar Nov 23 '23 03:11 default-writer

this is not an issue for Ubuntu users, which uses Unicode based encodings and do no stuck with OLD, really OLD, i mean MS-DOS stuff.

Since when "old" became a problem, especially in Unix world? I am working with an old codebase which originates from pre-unicode days and still feeds me to this day. Some comments in that code are in as you say almost MS-DOS encodings. I need a convenient way to work with that encoding, not a history lesson, thanks.

faerot avatar Nov 23 '23 19:11 faerot

I have a similar issue with CP1251/DOS encodings. Is there any work-around to make FAR2L to display these encodings at least in Editor correctly?

LanThrusteR avatar Feb 07 '24 14:02 LanThrusteR

@LanThrusteR почему совет https://github.com/elfmz/far2l/issues/1916#issuecomment-1819526603 Вам не подошел?

akruphi avatar Feb 07 '24 18:02 akruphi

@LanThrusteR почему совет #1916 (comment) Вам не подошел?

Sorry, I didn't give it enough attention.

Whoever stumbles over the same problem:

  1. create (in my case it was missing) ~/.config/far2l/cp

  2. add two lines

866 1251

  1. reload far2l
  2. open a text file containing cyrilic CP-1251 or DOS encoding with F4
  3. scroll through encodings by pressing F8 and see if it's readable

Thanks!

LanThrusteR avatar Feb 08 '24 00:02 LanThrusteR

Or just set ru locale system wide.

Btw, should this issue still kept open?

unxed avatar Apr 14 '24 18:04 unxed