CotEditor Support for imperfect MacJapanese encoding

Support for imperfect MacJapanese encoding

Open gingerbeardman opened this issue 4 years ago • 2 comments

Is your feature request related to a problem? Please describe.

Open attached file MACPEOPLE-1998-NO2.txt
File encoding cannot be ascertained correctly

File is mostly MacJapanese, "Japanese (Mac OS)", but there are some invalid characters (0x7f "DEL" and 0x00 "NUL" amongst others). The file was produced by dumping directory listing contents using hfsutils (using command hls -1aR > file.txt). Old Japanese CD-ROMs for some reason have these strange characters in their filenames (perhaps due to errors in the input method such as Kotoeri or EGBRIDGE?). CD-ROM ISO is available here: https://archive.org/details/macpeople-1998-no-2

Describe the solution you'd like

existing 1.choose encoding "Japanese (Mac OS)" 2.select reinterpret

new 3.for any characters that are invalid in MacJapanese: interpret as MacRoman

...it will display correctly (tested with my own code).

Describe alternatives you've considered

Process text manually outside of CotEditor. Works OK as proof of concept, but its not a long term solution.

Additional context

Also see #1218

Oct 27 '21 11:10 gingerbeardman

You might like to know that Tcl can change the encoding of these strange files.

https://gist.github.com/gingerbeardman/4a3b66236e018b72b32ca17953474e12

Nov 08 '21 23:11 gingerbeardman

Thinking about this again, I think that the support for reading and converting MacJapanese is not as good as Tcl (whose encoding definition was written by Apple).

I wonder how it differs?

Nov 27 '21 01:11 gingerbeardman

Sorry for my super late reply and reject. Currently, CotEditor let the system API decode data just by specifying the encoding. I think it isn't worth to write own decoding algorithm for broken data and specific encoding combination.

Mar 14 '23 12:03 1024jp

I understand. I'm happy with CotEditor!

Mar 14 '23 15:03 gingerbeardman

CotEditor CotEditor copied to clipboard

Support for imperfect MacJapanese encoding

CotEditor
CotEditor copied to clipboard