[bugfix] compare key equality in blob file reader using user comparator instead of simply using equal sign
With custom comparator which only compare part of user_key, record.key is always not equal to user_key in BlobFileReader::VerifyBlob(). For example, user_key is always suffixed with ttl, and custom comparator without ttl like below would make record.key not equal to user_key.
const size_t TTL_LEN_IN_BYTES = 8;
class NoTTLComparator: public rocksdb::Comparator {
public:
static const char* kClassName() { return "NoTTLComparator"; }
const char* Name() const override {return kClassName();}
int Compare(const rocksdb::Slice& a, const rocksdb::Slice& b) const override {
rocksdb::Slice a_no_ttl = rocksdb::Slice(a.data(), a.size()-TTL_LEN_IN_BYTES);
rocksdb::Slice b_no_ttl = rocksdb::Slice(b.data(), b.size()-TTL_LEN_IN_BYTES);
return rocksdb::BytewiseComparator()->Compare(a_no_ttl, b_no_ttl);
}
void FindShortestSeparator(std::string* start, const rocksdb::Slice& limit) const {};
void FindShortSuccessor(std::string* key) const {};
};
Hi @lnparker!
Thank you for your pull request and welcome to our community.
Action Required
In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.
Process
In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.
Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.
If you have received this in error or have any questions, please contact us at [email protected]. Thanks!
@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@lnparker Would you be able to make format and update the PR? Then I will reimport this for internal review.
@lnparker has updated the pull request. You must reimport the pull request before landing.
@lnparker has updated the pull request. You must reimport the pull request before landing.
@lnparker Would you be able to
make formatand update the PR? Then I will reimport this for internal review.
@jaykorean already done. thanks
@bigfootjon has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@bigfootjon has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@bigfootjon has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
Blob verification exactly needs to do a byte by byte check. If you are doing user defined ttl with the example customer comparator, how ttl is encoded into a user key should be completely opaque to RocksDB and consistently followed through out all DB read/write invocations. All keys provided via DB::Write(key, value), DB::Get(key) should have its key format follow the expectation of the Comparator and the last 8 bytes being TTL. Otherwise, it will by design fail. Both record_slice.user_key and user_key are expected to have TTL already included in it and blob verification expect them to be exactly the same.
@jowlyzhang @jaykorean Hi, in my case, when setting the key via DB::Write(key, value), the key is suffixed with timestamp like (now() +36000). When getting the key via DB::Get(key), we can not confirm the exact timestamp, so we get the key suffixed with empty timestamp. In another scenario, one key can be written repeatedly suffixed with different timestamp, but we want to treat them to be the same key considering the following compaction. Using the custom comparator provided before, [userkey][timstamp:0] is equal to [userkey][timestamp:now()+36000], but record_slice.user_key and user_key is not exactly the same. Can this case be supportted?
when setting the key via DB::Write(key, value), the key is suffixed with timestamp like (now() +36000). When getting the key via DB::Get(key), we can not confirm the exact timestamp, so we get the key suffixed with empty timestamp.
Get(key) is a point look up that is intended to look for the exact match. A range scan is needed in this case with the key without ttl part as prefix. That gets all versions of keys.
Making a custom comparator to treat [userkey][timstamp:0] and [userkey][timestamp:now()+36000] as the same keys is a hack that would break RocksDB invariants, such as this blob verification. With that said, why not encode ttl into the value, as opposed to key so that point look up can work. It seems like you do not want to retain multiple versions of the same key with different timestamps. RocksDB's built-in ttl is doing similar thing to encode time into value: https://github.com/facebook/rocksdb/wiki/Time-to-Live
when setting the key via DB::Write(key, value), the key is suffixed with timestamp like (now() +36000). When getting the key via DB::Get(key), we can not confirm the exact timestamp, so we get the key suffixed with empty timestamp.
Get(key) is a point look up that is intended to look for the exact match. A range scan is needed in this case with the key without ttl part as prefix. That gets all versions of keys.
Making a custom comparator to treat [userkey][timstamp:0] and [userkey][timestamp:now()+36000] as the same keys is a hack that would break RocksDB invariants, such as this blob verification. With that said, why not encode ttl into the value, as opposed to key so that point look up can work. It seems like you do not want to retain multiple versions of the same key with different timestamps. RocksDB's built-in ttl is doing similar thing to encode time into value: https://github.com/facebook/rocksdb/wiki/Time-to-Live
@jowlyzhang hi, dear jowly,
- Actually, we indeed use rocksdb::Scan instead of rocksdb::Get with rocksdb::Seek to [user_key][].
- Without blobdb, my case all works fine with rocksdb, could this custom comparator case be supportted with blobdb?
- https://github.com/facebook/rocksdb/wiki/Time-to-Live is db level, but user comparator is column family level.
- In my case, compation filter without getting the big blob value can be effectively used to remove expired data. Thanks!
- Without blobdb, my case all works fine with rocksdb, could this custom comparator case be supportted with blobdb?
If you use Iterator to seek to the key without timestamp part, and blob db, this blob verification part should work fine too, assuming you are using the RocksDB built in BytewiseComparator. You don't need a customized comparator.