Add support for compressor plugins
This is a follow-up to PR #7650, adding support for external compressor plugins. The first commit in this PR will match PR #7650 until that is merged.
The Compressor and related classes are made part of the public API to allow development of plugins and to expose compressors in options.
Compressor plugins can be used to easily integrate new compression algorithms into RocksDB. They can also implement compression techniques tailored to specific types of data. For example, if the values in a database are of numeric type (e.g., arrays of integers) with particular distributions (e.g., limited range within each block), the values could be compressed using a lightweight compression algorithm implemented as a plugin (such as frame-of-reference or delta encoding).
Options for Compressors
New options are added to support plugin compressors. For example, compression was previously configured by compression (of type CompressionType) and compression_opts (of type CompressionOptions). This PR adds a compressor option (pointer to Compressor) to specify a compressor object (which includes type and options). This approach was followed for the following options:
- ColumnFamilyOptions: compression (compressor), bottommost_compression (bottommost_compressor)
- AdvancedColumnFamilyOptions: compression_per_level (compressor_per_level), blob_compression (blob_compressor)
- CompactionOptions: compression (compressor)
The existing CompressionType/CompressionOptions options are preserved for backward compatibility. If the user doesn't specify compressors (leaving them null), the CompressionType/CompressionOptions options are used as before. Otherwise, compressors override the existing options.
A new constant kPluginCompression is defined in CompressionType for plugin compressors. The SST properties block stores information about the specific compressor in the compression_name field. This is used to instantiate a suitable compressor when opening the SST.
Option String Examples
Built-in compressor (existing options)
compression=kZSTD;compression_opts={level=1}
Built-in compressor (new options)
compressor={id=ZSTD;level=1}
Plugin compressor (new options)
compressor={id=my_compressor;my_option1=value1;my_option2=value2}
Options Object Example
Built-in compressor (existing options)
Options options;
options.compression = kZSTD;
options.compression_opts.level = 1;
Built-in compressor (new options)
Options options;
ConfigOptions config_options;
Status s = Compressor::CreateFromString(
config_options,
"id=ZSTD;level=1",
&options.compressor);
Plugin compressor (new options)
Options options;
ConfigOptions config_options;
Status s = Compressor::CreateFromString(
config_options,
"id=my_compressor;my_option1=value1;my_option2=value2",
&options.compressor);
db_bench
For db_bench, compression_type and individual compression options (such as compression_level) were left unchanged for backward compatibility. compression_type is still used with plugin compressors to specify their name. Other compressor options can be passed using compressor_options.
Built-in compressor (existing options)
--compression_type=zstd --compression_level=1
Built-in compressor (new options)
--compression_type=zstd --compressor_options="level=1"
Plugin compressor (new options)
--compression_type=my_compressor --compressor_options="my_option1=value1;my_option2=value2"
Limitations/Future Work
Compressor plugins are currently not supported for
- WAL compression: it requires adding streaming compression to the Compressor interface, as described in PR #7650
- Compressed secondary cache
- BlobDB: the blob_compressor option allows passing options for built-in compressors, but plugins are not supported. Supporting plugins would require additional metadata to be stored with blob files.
These limitations will be addressed by future PRs.
@lucagiac81 This is very interesting. I am wondering if this is technically limited to just "Compression? It looks like I could implement a Compressor with the compress/decompress methods that would actually do encryption/decryption instead. Perhaps there is scope to generalise your approach a little further?
@adamretter That's a great point. In general, this should work with any reversible data transformation. For encryption specifically, RocksDB already supports pluggable providers, but there could be other uses.
@mrambacher Thanks for your feedback! I started by splitting the PR into multiple commits. I'll work on the remaining items.