CotEditor
CotEditor copied to clipboard
Support for imperfect MacJapanese encoding
Is your feature request related to a problem? Please describe.
- Open attached file MACPEOPLE-1998-NO2.txt
- File encoding cannot be ascertained correctly
File is mostly MacJapanese, "Japanese (Mac OS)", but there are some invalid characters (0x7f "DEL" and 0x00 "NUL" amongst others). The file was produced by dumping directory listing contents using
hfsutils(using commandhls -1aR > file.txt). Old Japanese CD-ROMs for some reason have these strange characters in their filenames (perhaps due to errors in the input method such as Kotoeri or EGBRIDGE?). CD-ROM ISO is available here: https://archive.org/details/macpeople-1998-no-2
Describe the solution you'd like
existing 1.choose encoding "Japanese (Mac OS)" 2.select reinterpret
new 3.for any characters that are invalid in MacJapanese: interpret as MacRoman
...it will display correctly (tested with my own code).
Describe alternatives you've considered
- Process text manually outside of CotEditor. Works OK as proof of concept, but its not a long term solution.
Additional context
- Also see #1218
You might like to know that Tcl can change the encoding of these strange files.
https://gist.github.com/gingerbeardman/4a3b66236e018b72b32ca17953474e12
Thinking about this again, I think that the support for reading and converting MacJapanese is not as good as Tcl (whose encoding definition was written by Apple).
I wonder how it differs?
Sorry for my super late reply and reject. Currently, CotEditor let the system API decode data just by specifying the encoding. I think it isn't worth to write own decoding algorithm for broken data and specific encoding combination.
I understand. I'm happy with CotEditor!