Add support for compressor plugins
This is a follow-up to PR #584, adding support for external compressor plugins. The first commit in this PR will match PR #584 until that is merged. It is equivalent to PR6717 in RocksDB.
The Compressor and related classes are made part of the public API to allow development of plugins and to expose compressors in options.
Compressor plugins can be used to easily integrate new compression algorithms into Speedb. They can also implement compression techniques tailored to specific types of data. For example, if the values in a database are of numeric type (e.g., arrays of integers) with particular distributions (e.g., limited range within each block), the values could be compressed using a lightweight compression algorithm implemented as a plugin (such as frame-of-reference or delta encoding).
Options for Compressors
New options are added to support plugin compressors. For example, compression was previously configured by compression (of type CompressionType) and compression_opts (of type CompressionOptions). This PR adds a compressor option (pointer to Compressor) to specify a compressor object (which includes type and options). This approach was followed for the following options:
- ColumnFamilyOptions: compression (compressor), bottommost_compression (bottommost_compressor)
- AdvancedColumnFamilyOptions: compression_per_level (compressor_per_level), blob_compression (blob_compressor)
- CompactionOptions: compression (compressor)
The existing CompressionType/CompressionOptions options are preserved for backward compatibility. If the user doesn't specify compressors (leaving them null), the CompressionType/CompressionOptions options are used as before. Otherwise, compressors override the existing options.
A new constant kPluginCompression is defined in CompressionType for plugin compressors. The SST properties block stores information about the specific compressor in the compression_name field. This is used to instantiate a suitable compressor when opening the SST.
Option String Examples
Built-in compressor (existing options)
compression=kZSTD;compression_opts={level=1}
Built-in compressor (new options)
compressor={id=ZSTD;level=1}
Plugin compressor (new options)
compressor={id=my_compressor;my_option1=value1;my_option2=value2}
Options Object Example
Built-in compressor (existing options)
Options options;
options.compression = kZSTD;
options.compression_opts.level = 1;
Built-in compressor (new options)
Options options;
ConfigOptions config_options;
Status s = Compressor::CreateFromString(
config_options,
"id=ZSTD;level=1",
&options.compressor);
Plugin compressor (new options)
Options options;
ConfigOptions config_options;
Status s = Compressor::CreateFromString(
config_options,
"id=my_compressor;my_option1=value1;my_option2=value2",
&options.compressor);
db_bench
For db_bench, compression_type and individual compression options (such as compression_level) were left unchanged for backward compatibility. compression_type is still used with plugin compressors to specify their name. Other compressor options can be passed using compressor_options.
Built-in compressor (existing options)
--compression_type=zstd --compression_level=1
Built-in compressor (new options)
--compression_type=zstd --compressor_options="level=1"
Plugin compressor (new options)
--compression_type=my_compressor --compressor_options="my_option1=value1;my_option2=value2"
Limitations/Future Work
Compressor plugins are currently not supported for
- WAL compression: it requires adding streaming compression to the Compressor interface, as described in PR #584
- Compressed secondary cache
- BlobDB: the blob_compressor option allows passing options for built-in compressors, but plugins are not supported. Supporting plugins would require additional metadata to be stored with blob files.
These limitations will be addressed by future PRs.
@lucagiac81 please rebase
#397 should be able to help solve this problem. Using it, we can set the compressor as an option and not serialize the compression+opts if the Compressor is not set. Once your PR is merged, I suggest you open another issue to outline that issue, preferably with an associated unit test. Then I can help work on a fix.
Code is rebased on latest main based RocksDB 8.6.7 (the previous code was based on RocksDB 8.1.1). I kept some of the changes in separate commits for ease of review.
Fix vector option parsing
- A parsing error was discovered with compressor_per_level, which is a vector of Compressor (Customizable).
- The vector is serialized with an outer set of braces. For example: {{id=NoCompression; parallel_threads=1}:{id=Snappy; parallel_threads=1}}
- DefaultOptionsFormatter::ToVector calls NextToken which only removes the outer braces and returns one token with all the vector elements, instead of taking one element at a time.
- This commit resolves the issue by removing the outer set of brackets.
Add test for option override issue
- This exposes the bug described in a a previous comment in this PR (using compression and compressor options in option string).
Separate derived compression settings
- This solves the bug exposed in the previous commit.
- It adds a "derived" compressor field in MutableCFOptions, which is computed from the compressor/compression options. In this way, user selection is never overwritten by default settings.
I also tried to fix the bug using PR397 (with kUseBaseAddress), but I could not cover all cases.
- With that PR397, I’m able to update “compressor” and “compression” fields in MutableCFOptions when parsing either option in the string.
- However, I still run into issues when parsing the option file.
- I noticed that the parse order from the option file is: compression, compressor, compression_opts
- When parsing compression_opts, we need to apply the options only if the user specified the "compression" option. If the user specified "compressor", the options are already included.
- We could only keep "compressor" in the option file, but the file would become incompatible with previous versions of Speedb.
- I have the code for this solution available in a separate branch if you'd like to review it.
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.