js-deflate icon indicating copy to clipboard operation
js-deflate copied to clipboard

can't get any other compression library to recognize this format

Open thejoshwolfe opened this issue 8 years ago • 1 comments

Using python:

import zlib
zlib.decompress("w7NIw43DicOJBwA=".decode("base64"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check

using node.js:

> zlib.inflateSync(new Buffer("w7NIw43DicOJBwA=", "base64"))
Error: incorrect header check
...
> zlib.inflateRawSync(new Buffer("w7NIw43DicOJBwA=", "base64"))
Error: invalid distance too far back
...
> zlib.gunzipSync(new Buffer("w7NIw43DicOJBwA=", "base64"))
Error: incorrect header check
...

But I can get python's compressed blobs to be accepted by node's inflateSync. (And I have experience using python for png formatting, and node for zip file formatting.)

Is this project compliant with the DEFLATE spec?

thejoshwolfe avatar Apr 09 '16 04:04 thejoshwolfe

Looks like the compressor/decompressor works fine internally, the problem is with test/base64.js. When you use the provided module to base64-encode the compressed data, it treats the input as Unicode codepoints instead of raw bytes.

  • For example, compressing "Hello" produces bytes [f3 48 cd c9 c9 07 00].

  • However, the Base64.toBase64() module treats the input as Unicode characters [U00f3 U0048 U00cd U00c9 U00c9 U0007 U0000] and uses UTF-8 to encode them to bytes, resulting in U00f3 ⇒ [c3 b3], U0048 => [48], U00cd ⇒ [c3 8d], and so on.

  • So after the UTF-8 encoding, you get [c3 b3 48 c3 8d c3 89 c3 89 07 00].

To recover incorrectly encoded data, do the opposite – run it through an UTF-8 decoder:

>>> bad_buf = Buffer.from("w7NIw43DicOJBwA=", "base64");
<Buffer c3 b3 48 c3 8d c3 89 c3 89 07 00>
>>> good_buf = Buffer.from(bad_buf.toString("utf8"), "latin1");
<Buffer f3 48 cd c9 c9 07 00>
>>> zlib.inflateRawSync(good_buf).toString("latin1")
'Hello'
$ cat bad_data | base64 -d | iconv -f utf8 -t latin1 | base64 -e > good_data

grawity avatar Jul 04 '19 17:07 grawity