rocksdb
rocksdb copied to clipboard
Add a new int64 add merge operator, also with Java API
Supercedes https://github.com/facebook/rocksdb/pull/6884 and https://github.com/facebook/rocksdb/pull/6877 (cc @adamretter )
This branch/PR is based on a copy of the code in https://github.com/facebook/rocksdb/pull/6884, which given the age of the PR we chose to restart, though the core of the merge operator is carried over.
This merge operator adds int64_t values. At a cursory glance, this may appear similar to UInt64AddOperator, but this merge operator handles signed numbers. It can also handle both Addition or Subtraction by merging either positive or negative integers.
Values are stored into the database using 8-bit variable-byte encoding (as suggested by @pdillinger). Because the length is known, we do not require a bit flag for continuation. So for smaller positive numbers the value stored has the advantage of also consuming less bytes than UInt64AddOperator. The most significant stored byte is a 2s complement value which is sign-extended on decoding.
Therefore the range -128..127 can be stored in 1 bytes, -32768..32767 in 2 bytes, etc. Zig-zag encoded values are in turn encoded as varints in the data. Varints now implement the 8-bit variable-length encoding (known length) suggested by @pdillinger.
A Java API to this merge operator has been added. In addition, a small Java utility is provided which implements the same encoding as the merge operator.
Changing the UInt64MergeOperator to use a similar varint encoding would require that a "new" operator be created, in order to avoid backward compatibility breaking.
- The Makefile now builds a single
merge_operators_test
binary which combines the GTEST for this merge operator with that of stringappend. This addresses your concerns re additional test binaries. - I have regenerated TARGETS as per instructions.
@dannyhchen has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
@anand1976 @adamretter rebased and once again ready for review/import.
@anand1976 rebased and once again ready for review/import.
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
@anand1976 I have once again rebased this and it is ready for review/import.
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
@anand1976 I have seen a one-off failure https://app.circleci.com/pipelines/github/facebook/rocksdb/34077/workflows/ca443186-ecf3-46ba-a6ca-93c051d1a363/jobs/695907 in a check introduced recently by https://github.com/facebook/rocksdb/pull/11812 - would you be able to comment on whether I have geneuinely broken something ? I suspect that given that the check is recent it may not always be correct ?
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
Zig-zag encoded values are in turn encoded as varints in the data. Varints now implement the 8-bit variable-length encoding (known length)
Why do we need to zig-zag? Now that the value doesn't encode the length, couldn't we just drop the most significant bytes that can be recovered through sign extension?
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
@alanpaxton has updated the pull request. You must reimport the pull request before landing.
@alanpaxton has updated the pull request. You must reimport the pull request before landing.