js-codepage
js-codepage copied to clipboard
ISO 2022 JIS Japanese encoding fails
Hi, thanks very much for your work on this repository, it's incredibly useful. We use it as the main character encoding library for CyberChef.
We've recently noticed an issue when trying to encode into ISO 2022 JIS Japanese where only null bytes are returned.
The affected CP numbers are 50220, 50221 and 50222.
Example code
import cptable from "codepage";
cptable.utils.encode(50220, "こんにちは");
Expected output
Uint8Array(10) [164, 179, 164, 243, 164, 203, 164, 193, 164, 207]
Actual output
Uint8Array(5) [0, 0, 0, 0, 0]
Can you shed any light on this behaviour?
Another example that also fails:
Code
import cptable from "codepage";
cptable.utils.encode(50220, "ーム")
Expected output
Uint8Array(10) [27, 36, 66, 33, 60, 37, 96, 27, 40, 66]
Actual output
Uint8Array(2) [0, 0]
Thanks for sharing! The ISO 2022 codepages 5022{0,1,2,5,7} are definitely incorrect -- hiragana require a control sequence and those are not currently supported. Based on ECMA-35, the first kana "こ" should be encoded as 1B 24 42 24 33 (1B 24 42 to switch to the JIS double byte encoding, 24 for the Hiragana subset and 43 for the actual character). This will require a direct implementation of control sequences and a new set of LUTs for the various character subsets.
PS: All of the generated codepages with source listed as "Windows 7" are assumed to either be single-byte or double-byte. Clearly that wasn't the case here.