iconv-lite icon indicating copy to clipboard operation
iconv-lite copied to clipboard

Add ASCII-compatible charset checking method.

Open mscdex opened this issue 10 years ago • 3 comments

It can be useful to check for ASCII-compatible charsets when decoding data that is known to only have bytes that fall within the ASCII range. Doing such a check avoids having to do useless decoding.

mscdex avatar Jan 11 '15 21:01 mscdex

Interesting. Although I'm sure it can be done without additional generated file. Could you describe in more detail what you mean by "ASCII-compatible" charset? How that'll save you decoding? In my view, utf8 is not ASCII-compatible at all)

ashtuchkin avatar Jan 12 '15 14:01 ashtuchkin

ASCII-compatible means bytes 0x00-0x7F in an encoding are all ASCII characters/bytes and not some other characters. Many character sets are compatible in this way, but some are not.

Checking whether a destination encoding is ASCII-compatible is useful if you are already traversing binary data and you can check whether each byte is <= 0x7F. If there are no bytes above 0x7F and the encoding is ASCII-compatible, you don't have to run the entire set of data through a decoder which will end up giving you back the same string anyway.

The reason I use a pre-generated file for this is that it's significantly faster to use an object (with fast properties) than to do a lookup on the fly.

mscdex avatar Jan 12 '15 14:01 mscdex

Got it. Makes sense. I'm thinking to reuse current caching architecture, but avoid making it part of public api yet (it'd require tests, docs, etc.). What do you think about iconv.getCodec(encodingName).asciiCompatible? It's cached, but must load the data for codec once.

ashtuchkin avatar Jan 12 '15 15:01 ashtuchkin