kvrocks
kvrocks copied to clipboard
Improve consistency and isolation semantics by adding Context parameter to DB API
Search before asking
- [X] I had searched in the issues and found no similar issues.
Motivation
The current DB API may use multiple read operations or have nested calls. These read operations do not use fixed snapshots, which may cause different snapshot data to be read during a single operation, causing inconsistency.
Solution
Referring to the current LatestSnapshot and GetOptions, we can add a Context parameter to each DB API, through which the API can pass a definite snapshot.
After a few simple attempts, I found that changes often have a ripple effect, requiring multiple modules to change their APIs at the same time, which can result in a huge PR that is difficult to break down into multiple smaller PRs. I'm going to try to give a draft PR in the near future to get a rough idea of what to do, and then gradually refine the changes to each module.
Are you willing to submit a PR?
- [X] I'm willing to submit a PR!
Need some help:
struct Context {
explicit Context(engine::Storage *storage) : storage_(storage), snapshot_(storage->GetDB()->GetSnapshot()) {}
engine::Storage *storage_ = nullptr;
const rocksdb::Snapshot *snapshot_ = nullptr;
rocksdb::WriteBatchWithIndex* batch_ = nullptr;
Context() = default;
rocksdb::ReadOptions GetReadOptions();
const rocksdb::Snapshot *GetSnapShot();
};
This is the general idea: pass in the Context
to the Database API, use a fixed snapshot when calling, and this snapshot will not change during the entire calling process. When we need to read our own data for the current operation, we use WriteBatchWithIndex
+ GetFromBatchAndDB
to obtain the data batch=>db(snapshot=ctx.snapshot).
Most operations nowadays use WriteBatch. Is there any way to copy the operation sequence from WriteBatch
to ctx.WriteBatchWithIndex
? In this way, I only need to modify the last Write
part of storage, instead of modifying ctx.WriteBatchWithIndex
when the DB API modifies WriteBatch
.
Most operations nowadays use WriteBatch. Is there any way to copy the operation sequence from WriteBatch to ctx.WriteBatchWithIndex? In this way, I only need to modify the last Write part of storage, instead of modifying ctx.WriteBatchWithIndex when the DB API modifies WriteBatch.
rocksdb
has the relation below:
WriteBatchBase
WriteBatchWithIndex : WriteBatchBase
WriteBatch : WriteBatchBase
Should we switch to WriteBatchBase
in some place? Besides, WriteBatchWithIndex
has a WriteBatch* GetWriteBatch() override;
interface here
Need some help:
struct Context { explicit Context(engine::Storage *storage) : storage_(storage), snapshot_(storage->GetDB()->GetSnapshot()) {} engine::Storage *storage_ = nullptr; const rocksdb::Snapshot *snapshot_ = nullptr; rocksdb::WriteBatchWithIndex* batch_ = nullptr; Context() = default; rocksdb::ReadOptions GetReadOptions(); const rocksdb::Snapshot *GetSnapShot(); };
This is the general idea: pass in the
Context
to the Database API, use a fixed snapshot when calling, and this snapshot will not change during the entire calling process. When we need to read our own data for the current operation, we useWriteBatchWithIndex
+GetFromBatchAndDB
to obtain the data batch=>db(snapshot=ctx.snapshot).Most operations nowadays use WriteBatch. Is there any way to copy the operation sequence from
WriteBatch
toctx.WriteBatchWithIndex
? In this way, I only need to modify the lastWrite
part of storage, instead of modifyingctx.WriteBatchWithIndex
when the DB API modifiesWriteBatch
.
Correction: Because there may be multiple Writes, they should be appended to ctx.WriteBatchWithIndex instead of simply copied
@PokIsemaine I've checked that most output uses GetWriteBatch
with a WriteBatchBase
, would that ok for the scenerio here?
I think there are currently two ways:
- Keep the current WriteBatch, then use WriteBatch.Iterator(&handler) when writing, and refer to
batch_debugger.h
to write aWriteBatch::Handler
that appends WriteBatch operations to WriteBatchWithIndex one by one. - Get WriteBatchWithIndex in GetWriteBatchBase, but I found that WriteBatchWithIndex cannot support all operations of WriteBatch, such as DeleteRange. Even after using WriteBatchWithIndex::GetWriteBatch after DeleteRange, we cannot index the effect of DeleteRange in Batch through GetFromBatchAndDB.
For the DeleteRange operation, maybe we need to switch to for + Delete, but I don't know if this will have a big performance impact. If it is the first type, we just add Batch but do not perform Write operation, what will happen?
我认为目前两种方式:
- 保留现在的 WriteBatch,然后在 Write 的时候使用 WriteBatch.Iterator(&handler), 并参考
batch_debugger.h
的方式编写一个将 WriteBatch 操作逐个追加到 WriteBatchWithIndex 的WriteBatch::Handler
。 - GetWriteBatchBase 中改为获取 WriteBatchWithIndex,但是我发现 WriteBatchWithIndex 并不能支持 WriteBatch 的所有操作,例如 DeleteRange。即使使用 WriteBatchWithIndex::GetWriteBatch 后 DeleteRange,我们并不能通过 GetFromBatchAndDB 在 Batch 中索引到 DeleteRange 的效果。
对于 DeleteRange 操作,或许我们需要转为 for + Delete 的方式,但我不清楚这样是否会有很大的性能影响。如果是第一种,我们只是加入 Batch 但不进行 Write 操作,这样又会怎么样。
Some other questions:
- What isolation level can we expect if kvrocks requests are processed by multiple threads without using transactions? Serializable, snapshot isolation, or something else?
- What resources are specifically protected by LockGuard for write operations?
另外的一些疑问:
- 如果不使用事务,多线程处理 kvrocks 请求,我们期望什么样的隔离级别?可串行化、快照隔离还是其他的情况?
- 写操作的 LockGuard 具体保护的资源是什么?
(2) LockGuard protect the "keys" for operation. During writing, it first collects the key it tents to write, and Lock the all keys we would like to write
(1) Is a interesting problem, I think we're looking forward to a SI ( snapshot isolation ). This cannot avoid concurrency read-modify-write sequence to same key. And it would making single write or multiple write operations seeing the same snapshot