mcap icon indicating copy to clipboard operation
mcap copied to clipboard

[CLI] Use a higher compression ratio (slower) zstd setting by default

Open jhurliman opened this issue 3 years ago • 1 comments

jhurliman avatar Oct 10 '22 19:10 jhurliman

related: https://github.com/foxglove/mcap/issues/646

wkalt avatar Oct 10 '22 20:10 wkalt

@jhurliman Is it possible to detect compression settings from an existing compressed blob?

defunctzombie avatar Nov 22 '22 19:11 defunctzombie

@defunctzombie no, but I'm not sure why you would want to. When rewriting an MCAP file, you want to find a balance between processing time and the resulting compression ratio in the current context of whatever hardware you're running on and how much time you have to spare. The input file(s) were compressed in a different context (potentially different hardware, maybe real-time recording vs offline processing).

jhurliman avatar Nov 22 '22 21:11 jhurliman

The input file(s) were compressed in a different context (potentially different hardware, maybe real-time recording vs offline processing).

Is that an argument for the CLI being able to produce better compressed files with the default CLI flags? Or that the c++ SDK should change to use a faster compression by default?

defunctzombie avatar Nov 22 '22 21:11 defunctzombie

For the CLI to be able to produce equal-or-better-compressed files with the default CLI flags. The C++ SDK is fine I think; we have benchmark data showing it's a competitive recorder.

jhurliman avatar Nov 22 '22 22:11 jhurliman

Linear: FG-940

foxhubber[bot] avatar Dec 13 '22 01:12 foxhubber[bot]

I took a look at this. The go zstd encoder library we use is not able to achieve the compression ratios that the reference implementation can, even in the mode that compresses best. Our only option to achieve that result is to switch to the reference implementation by using https://github.com/DataDog/zstd . There are a few tradeoffs associated with this:

  1. klauspost/compress is designed to encode and decode with zero allocations, and includes the ability to retain allocated memory between compression sessions (with the Reset(io.Writer) API). DataDog/zstd does not do this, nor is minimal allocation behavior one of its stated goals.
  2. DataDog/zstd uses CGO, with associated compilation speed penalties and portability issues.

Because of this i'm wary of just switching implementations without fairly extensive benchmarking. The option I'd rather pursue is to add a pluggable compression API to the go MCAP API, similar to the TypeScript implementation. This allows users to choose the encoder/decoder implementation that works best for them.

I'm removing the bug label from this issue - it does not fit the definition in the handbook:

A bug occurs when a user is unable to complete a supported task, or our software does not behave as intended.

james-rms avatar Dec 20 '22 05:12 james-rms