mcap
mcap copied to clipboard
[CLI] Use a higher compression ratio (slower) zstd setting by default
related: https://github.com/foxglove/mcap/issues/646
@jhurliman Is it possible to detect compression settings from an existing compressed blob?
@defunctzombie no, but I'm not sure why you would want to. When rewriting an MCAP file, you want to find a balance between processing time and the resulting compression ratio in the current context of whatever hardware you're running on and how much time you have to spare. The input file(s) were compressed in a different context (potentially different hardware, maybe real-time recording vs offline processing).
The input file(s) were compressed in a different context (potentially different hardware, maybe real-time recording vs offline processing).
Is that an argument for the CLI being able to produce better compressed files with the default CLI flags? Or that the c++ SDK should change to use a faster compression by default?
For the CLI to be able to produce equal-or-better-compressed files with the default CLI flags. The C++ SDK is fine I think; we have benchmark data showing it's a competitive recorder.
Linear: FG-940
I took a look at this. The go zstd encoder library we use is not able to achieve the compression ratios that the reference implementation can, even in the mode that compresses best. Our only option to achieve that result is to switch to the reference implementation by using https://github.com/DataDog/zstd . There are a few tradeoffs associated with this:
klauspost/compressis designed to encode and decode with zero allocations, and includes the ability to retain allocated memory between compression sessions (with theReset(io.Writer)API).DataDog/zstddoes not do this, nor is minimal allocation behavior one of its stated goals.DataDog/zstduses CGO, with associated compilation speed penalties and portability issues.
Because of this i'm wary of just switching implementations without fairly extensive benchmarking. The option I'd rather pursue is to add a pluggable compression API to the go MCAP API, similar to the TypeScript implementation. This allows users to choose the encoder/decoder implementation that works best for them.
I'm removing the bug label from this issue - it does not fit the definition in the handbook:
A bug occurs when a user is unable to complete a supported task, or our software does not behave as intended.