rocksdb icon indicating copy to clipboard operation
rocksdb copied to clipboard

User Defined Timestamp support in the index of WriteBatchWithIndex

Open muthukrishnan24 opened this issue 2 years ago • 3 comments

Need User Defined Timestamp support in WriteBatchWithIndex

muthukrishnan24 avatar Jul 27 '22 09:07 muthukrishnan24

Thanks @muthukrishnan24 for the interest. WriteBatchWithIndex is currently compatible with user-defined timestamp in that you can write a key to wbwi object and RocksDB will internally allocate space for timestamp to be filled later. The index structure of WBWI is not timestamp aware. The WBWI is mostly used to buffer the local writes. According to the principle of read your own writes, all the data in the WBWI is considered newer than any committed data in the database. Furthermore, in most cases of our interest, we do not know the timestamp until we decide to commit the WBWI to the database. Therefore, we intentionally didn't add timestamp to the index structure of WBWI, which otherwise would be non-negligible overhead.

riversand963 avatar Jul 27 '22 18:07 riversand963

RocksDB will internally allocate space for timestamp to be filled later

is it possible to fill the timestamp before db.Write with WriteBatchWithIndex? like WriteIndex has UpdateTimestamps API.

muthukrishnan24 avatar Aug 08 '22 03:08 muthukrishnan24

Can you call WriteBatchWithIndex::GetWriteBatch() and call UpdateTimestamps() API on the underlying write batch?

riversand963 avatar Aug 09 '22 00:08 riversand963

Yes, it works

muthukrishnan24 avatar Sep 06 '22 10:09 muthukrishnan24

with timestamp enabled on default cf, WriteBatchWithIndex:Put(default_cf_handle, "foo", "bar") returns error "Default cf timestamp size mismatch"

from what i can see, need to set default_cf_ts_sz but WriteBatchWithIndex uses backup_index_comparator.timestamp_size as default.

muthukrishnan24 avatar Sep 06 '22 13:09 muthukrishnan24

Looking at https://github.com/facebook/rocksdb/blob/main/include/rocksdb/utilities/write_batch_with_index.h#L89:L92. It says

backup_index_comparator: the backup comparator used to compare keys within the same column family, if column family is not given in the interface, or we can't find a column family from the column family handle passed in, backup_index_comparator will be used for the column family

We know that the column_family argument may not be specified for certain WriteBatchWithIndex APIs, and when this happens, we are trying to buffer some writes for the default column family. Therefore, as the API says, in this case, the default column family's comparator needs to be provided to the constructor of WBWI as the argument backup_index_comparator.

riversand963 avatar Sep 08 '22 22:09 riversand963

Thanks for clarification @riversand963

I've created #10529 PR which adds UpdateTimestamps, new create functions for WriteBatch and WriteBatchWithIndex in C API

muthukrishnan24 avatar Sep 09 '22 05:09 muthukrishnan24