speedb icon indicating copy to clipboard operation
speedb copied to clipboard

Add Compressor interface

Open lucagiac81 opened this issue 2 years ago • 4 comments

This PR refactors compression by introducing a Compressor interface. This is a first step towards supporting compressors as plugins. PR #585 covers the next step. It is equivalent to PR7650 in RocksDB.

Compressor class

The Compressor class defines an interface for each compression algorithm to implement. It is a Customizable class, like other extensible components in Speedb.

A Compressor has

  • A unique name
  • Compress/Uncompress methods
  • CreateDictionary method (for algorithms supporting dictionaries): the logic to build/train a dictionary is moved here, so future compressors have the option to customize it if necessary
  • Methods to handle processed/digested dictionaries (implemented by zstd, for example)
  • Options: each Compressor can define the options it supports using the Configurable framework

Built-in compressors

The existing compression algorithms (referred to as "built-in") are encapsulated in Compressor classes. The classes inherit from BuiltinCompressor, which includes functionality shared by all built-in compressors. Built-in compressors can be referenced by their numeric id (as defined by the CompressionType enum) to ensure backward compatibility. BuiltinCompressor uses the existing CompressionOptions struct as its configurable options.

Compressor lifecycle

For this PR, compression options exposed in the public API are unchanged (exposing Compressor in the public API and options is the scope of PR #585).

  • The CompressionType and CompressionOptions passed through ColumnFamilyOptions and DBOptions are used to instantiate suitable Compressor instances (this is done in MutableCFOptions and ImmutableDBOptions).
  • The Compressor class keeps track of the instances currently in use, so they can be retrieved and reused.
  • Such Compressor instances are then used in other parts of the code in place of CompressionType and CompressionOptions (e.g., in BlockBasedTableBuilder, BlockBasedTable, FlushJob, Compaction, BlobFileBuilder...).
  • The details of the Compressor used for a block-based table are serialized in the Properties block.
  • When opening the SST file, the info in the Properties block is used to instantiate/retrieve a suitable Compressor for the table. If the compression id for a block doesn't match the table-level Compressor, a suitable Compressor is obtained when reading the block.

TODO

Streaming compression is not included in the Compressor class yet. The plan is to cover that in a separate PR. It could be offered through additional methods in the Compressor class. For example

  • CreateStreamingCompressContext(): returns a context object similar to StreamingCompress
  • StreamingCompress(context,...): compress method takes context and the usual input/output buffers
  • Similar methods for uncompress
  • StreamingSupported(): to query capability of a Compressor

lucagiac81 avatar Jun 26 '23 22:06 lucagiac81

@lucagiac81 please rebase

ofriedma avatar Oct 18 '23 10:10 ofriedma

@ofriedma, @speedbmike - Please review

udi-speedb avatar Oct 29 '23 09:10 udi-speedb

Code is rebased on latest main based RocksDB 8.6.7 (the previous code was based on RocksDB 8.1.1).

One note regarding ZSTDContext

  • ZSTDContext replaces CompressionContext, making it Compressor-specific. A Compressor can manage its own context as needed (only ZSTD currently needs this).
  • In the previous version of the PR, there was one ZSTDContext instance per thread.
  • After RocksDB PR 11666, ZSTDContext depends on the options selected for ZSTDCompressor. So, different ZSTDCompressor instances (with potentially different options) should not share ZSTDContext instances.
  • The latest code has one ZSTDContext instance per thread, but internally a context is created for each instance of ZSTDCompressor. The net effect is to have one context instance per thread per ZSTDCompressor instance.

lucagiac81 avatar Jan 11 '24 16:01 lucagiac81

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Feb 01 '24 08:02 CLAassistant