lhasa icon indicating copy to clipboard operation
lhasa copied to clipboard

Should add flags for filename encodings

Open fragglet opened this issue 2 years ago • 2 comments

The code currently does no translation for filename encodings and there are a variety of different ways that filenames can be encoded. In particular Shift-JIS and EUC support are important since lha format is/was very popular in in Japan. These unfortunately will need to be manually specified since as far as I know there is no way to detect the encodings. We should internally translate everything to UTF-8.

There are some extended ASCII formats that can be reasonably autodetected based on the OS field: for example CP437 is probably a sensible default for DOS archives (or the system codepage when running on Windows) , and Mac Extended ASCII for macOS archives. If the encoding cannot be determined then non-ASCII characters should become the Unicode replacement character.

With this in place we can relax the "safe print" code currently in place, although it's still important to never print a terminal escape character or anything in the C0/C1 control character ranges (and probably the specials range too)

fragglet avatar Mar 29 '23 15:03 fragglet

Also, lha has been popular on Amiga OS. Default encoding seems to be Latin1, although there are different mappings for countries, which doesn't easily fall into Latin1. I guess, auto detection for corner cases could be difficult if not possible. Perhaps an external mapfile as an command line option could help in such situation, so that lhasa doesn't need to do make assumptions.

gryf avatar Oct 22 '23 16:10 gryf

Indeed, Latin1 looks strange

...
[generic]                  909    2192  41.5% -lh5- 651e Nov 24  2018 AmiArcadia/Source/generic/espaÐl.ct
[generic]                  935    2225  42.0% -lh5- 3231 Nov 24  2018 AmiArcadia/Source/generic/franíÂis.ct
...

http://aminet.net/package/misc/emu/AmiArcadiaMOS

polluks avatar Dec 18 '23 10:12 polluks