pako icon indicating copy to clipboard operation
pako copied to clipboard

Pako 1.x -> 2.x raw decompression regression with intermediate flush(?)

Open ngladitz opened this issue 3 years ago • 5 comments

Given the following package.json:

{
  "name": "pako-test",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "dependencies": {
    "pako1": "npm:pako@^1.0.11",
    "pako2": "npm:pako@^2.0.4"
  }
}

And following index.mjs:

import {default as pako1} from 'pako1';
import * as pako2 from 'pako2';

const original = new Uint8Array([1,2,3,4,5,6,7,8,9,10]);
const extraInput = new Uint8Array([11,12]);

const deflator = new pako1.Deflate({raw: true});

deflator.push(original, 3);
const partialLength = deflator.strm.total_out;

deflator.push(extraInput, true);

const fullCompressed = deflator.result;

const partialCompressed = fullCompressed.subarray(0, partialLength);

console.log("fullCompressed", fullCompressed);
console.log("partialCompressed", partialCompressed);

const decompressed1 = pako1.inflateRaw(partialCompressed);
const decompressed2 = pako2.inflateRaw(partialCompressed);

console.log("original", original);
console.log("decompressed1", decompressed1);
console.log("decompressed2", decompressed2);

Running node index.mjs the output I get is:

fullCompressed Uint8Array(20) [
   98, 100,  98, 102, 97, 101, 99,
  231, 224, 228,   2,  0,   0,  0,
  255, 255, 227, 230,  1,   0
]
partialCompressed Uint8Array(16) [
  98, 100,  98, 102, 97, 101,
  99, 231, 224, 228,  2,   0,
   0,   0, 255, 255
]
original Uint8Array(10) [
  1, 2, 3, 4,  5,
  6, 7, 8, 9, 10
]
decompressed1 Uint8Array(10) [
  1, 2, 3, 4,  5,
  6, 7, 8, 9, 10
]
decompressed2 undefined

inflateRaw returns the expected output in version 1 but returns nothing (undefined) in version 2.

I might have missed an explicit / intended behavior change that explains this and maybe there is some specific option that would also make this work with version 2 otherwise this looks like a regression?

ngladitz avatar Aug 13 '21 10:08 ngladitz

See https://github.com/nodeca/pako/issues/234#issuecomment-887096829

v2 changed wrapper to "standard one", according to zlib docs. v1 terminated process on sync mark, that caused problem with multistream files. v2 does NOT emit end on sync mark, but still should flush data (and you can join it manually anytime).

In short - code, working in v1 used buggy behaviour :). Wrapper is not magical, and can not fit all possible use cases. If you need partial compress/decompress - see wrapper src and tweak for your need.

Also, if you know popular multistream use cases - you can suggest tests, to be sure this will not be broken in future.

puzrin avatar Aug 13 '21 12:08 puzrin

Thank you for quick reply.

I am not sure which code specifically is the wrapper (I am not familiar with the internal architecture) and the noted comment seems to reference deflation rather than inflation but I tried the following (extending the code above):

const inflator2 = new pako2.Inflate({raw: true});
inflator2.push(partialCompressed, true);
console.log("inflator2", inflator2);

I do see the inflated content is still in inflator2.strm.output; if I understood you correctly it should have been flushed ... does that mean it should have been transferred to inflator2.chunks (currently empty array)?

Also not sure what multistream means specifically in this context; my use case is I've got a compressed stream containing a stack of images (written natively with zlib). Each image is flushed explicitly (locations in the output stream are remembered / stored) with Z_FULL_FLUSH to allow extracting specific single images from the stack without extracting the entire stack.

ngladitz avatar Aug 13 '21 13:08 ngladitz

I am not sure which code specifically is the wrapper

https://github.com/nodeca/pako/tree/master/lib - those 2 files are top level wrappers for zlib port. If you wish to use partial compress-decompress - you should understand what happens there, and be able to patch/replace for your needs.

I do see the inflated content is still in inflator2.strm.output; if I understood you correctly it should have been flushed ... does that mean it should have been transferred to inflator2.chunks (currently empty array)?

See wrapper sources first. In theory, Z_xxx_FLUSH should force pending output to .chunks. But i don't used this mode. The only thing i can say, wrapper use the same logic as zlib doc.

Also, you can create your own custom wrapper for your needs, and use zlib directly.

Also not sure what multistream means specifically in this context;

It means deflated data, consisting of several parts => having "sync mark" inside.

puzrin avatar Aug 13 '21 14:08 puzrin

Thank you for elaborating.

The current condition for the onData call that per default appends to .chunks is https://github.com/nodeca/pako/blob/0398fad238edc29df44f78e338cbcfd5ee2657d3/lib/inflate.js#L262

So there currently is no output chunk (even when flushing) unless the output area is completely full or the input stream was fully consumed.

Would you consider changing this to also consume the data (call onData()) when _flush_mode > 0? I'd then be able to get away with manually merging chunks from .chunks or overloading onData().

Being able to manually trigger / force finalization (onEnd()) might also be nice if it indeed can't be the default for some use cases.

I suppose I could write my own wrapper (the zlib interface itself isn't exported though(?)) or manually concatenate the data from both .chunks and .strm.output otherwise but I guess I'd be depending somewhat on implementation details too then.

ngladitz avatar Aug 13 '21 15:08 ngladitz

I have no objection about improving flush in inflate wrapper, but i don't know how to do it right. Previous attempt to add "everything possible" (in v1) ended with cryptic & unmanainable wrapper without tests. So i had to drop all reinvented weels and do everything from scratch.

If anyone has time, i'd suggest to investigate logic of nodejs native zlib wrapper. It's battle-tested and should be ok.

Or, you can just copy-paste inflate wrapper from v1. If you have only one use case, you don't need to worry about universal solution.

puzrin avatar Aug 13 '21 17:08 puzrin