rocksdb
rocksdb copied to clipboard
Unable to get all records while traversing database
Note: Please use Issues only for bug reports. For questions, discussions, feature requests, etc. post to dev group: https://groups.google.com/forum/#!forum/rocksdb or https://www.facebook.com/groups/rocksdb.dev
Expected behavior
We have written millions of records into the database, and these records are prefixed with 'xxx' or 'yyy', such as: 'xxx0', 'xxx1', 'yyy0', 'yyy1', etc. In addition, we have customized the comparator to ensure that the records prefixed with 'xxx' are ranked before 'yyy'. In the case of the same prefix, the smaller number is ranked first.
When we iterate in the following way, we should be able to get all the records prefixed with 'xxx':
rocksdb::Iterator* iter = db->NewIterator(rocksdb::ReadOptions());
for (iter->Seek("xxx"); iter->Valid() && iter->key().starts_with("xxx"); iter->Next()) {
// print key and value
}
Actual behavior
In fact, the records between 'xxx100' and 'xxx1001' are missing from the result of the traversal. But if we specify access to 'xxx105' in the following way, we can get the corresponding value:
std::string value;
rocksdb::Status s = db->Get(rocksdb::ReadOptions(), "xxx105", &value);
Steps to reproduce the behavior
- Define the comparator, and write the data starting with prefix, we wrote about four million records.
- Traverse the database by seeking prefix.
The above anomalies are occasional.
In addition, is there an easy way to replace the comparator in the ldb tool. Currently, I cannot directly view the database written by myself using ldb. The error message I get is as follows: Failed: Invalid argument: leveldb.BytewiseComparator: does not match existing comparator
What version of RocksDB are you using?
When you say the anomalies are "occasional", do you mean that only certain keys are missing or that sometimes a given iterator will return the keys and another one might not? If the latter, is compaction/flush happening when the problem is observed?
Are the keys that are missing in the middle of a range (e.g missing xxx103 but say xxx102 and xxx104) or at the beginning or end?
ldb.BytewiseComp
The version we are using is 6.19.
Occasional here means that when we found that the traversed data is partially missing, we have repeatedly tried to reproduce according to the following steps:
- Open a brand new database.
- Write about 100,000 records to it, of course, these records are also prefixed with 'xxx' and 'yyy'.
- Traverse the database by prefix 'xxx'.
Everything works fine when we iterate through the new database. But when it comes to the problematic database, we always lose data between xxx100 and xxx1001. This range has never changed.
These missing keys are in the middle.
Everything works fine when we iterate through the new database. But when it comes to the problematic database, we always lose data between xxx100 and xxx1001.
does the issue persist after re-open the problematic db?
In addition, is there an easy way to replace the comparator in the ldb tool. Currently, I cannot directly view the database written by myself using ldb. The error message I get is as follows: Failed: Invalid argument: leveldb.BytewiseComparator: does not match existing comparator
as the user defined comparator is code, there's no easy way to provide that in command line (you will have to change the ldb code). But you should still be able to dump each SST content with ldb dump
or sst_dump
.
does the issue persist after re-open the problematic db?
Yes,even if we reopen the database for traversal, there is still no data between xxx100 and xxx1001.
as the user defined comparator is code, there's no easy way to provide that in command line (you will have to change the ldb code). But you should still be able to dump each SST content with
ldb dump
orsst_dump
.
We used ldb dump
and sst_dump
to scan the *.log and *.sst files, and found all the keys.
The attachment is our data and the test tool we wrote ourselves. For the specific usage method, please refer to README.md. db_related.zip
@jay-zhuang @ajkr Is there any way to investigate this issue? We are willing to provide more detailed information.
@hemin1020 : regarding this custom Comparator code you shared...
int Compare(const rocksdb::Slice& a, const rocksdb::Slice& b) const override {
// "extentX" < "streamX"
if(a[0] > b [0]) {
return -1;
} else if (a[0] == b[0]) {
// "extentX" < "extentY"
// "streamX" < "streamY"
rocksdb::Slice aa(a);
rocksdb::Slice bb(b);
aa.remove_prefix(6);
bb.remove_prefix(6);
size_t left = atol(aa.data());
size_t right = atol(bb.data());
if (left == right) {
return 0;
} else if (left < right) {
return -1;
}
return 1;
}
return 1;
}
There may be a few issues with this code.
First of all, atol
expects a null-terminated string, but a rocksdb::Slice is not necessarily null-terminated (in the majority of cases, it won't). So the atol
may actually read over the bounds of the Slice, causing undefined behavior.
Second, there is an unconditional aa.remove_prefix(6)
in the comparator. This is fine if all keys are at least 6 bytes long.
However, if you are passing xxx
as a start key for a Seek operation, it's just 3 bytes. Then the remove_prefix(6)
will cause undefined behavior.
I can't say for sure that undefined behavior is happening in your particular case, but it may be worth to assert in the comparator that the input Slices a
and b
are at least 6 bytes long. It will also be useful to add some protective measures for not reading over the bounds with atol
.
Finally, atol
produces a long, which may or may not be large enough to hold the numeric representation of the digts in your keys.
When in doubt, I recommend also to build with ASan/UBSan to check if it reports any such issues. Hope this helps.