orientdb
orientdb copied to clipboard
Implement migration from component locking to page locking in TXs
Implement Wait-For graph deadlock detection and migration for component locking to page locking in TXs. Use StampledLock implementation as a base for this new lock implementation to perform read queries on BTress and probably clusters without using memory barriers on X86 architecture.
We do not need to roll back TXs in case of deadlock. We just need to replay it again using the operation log provided on the Tx commit.
The following changes need to be made:
- Keep a version of the page inside the page itself.
- Every time the page is updated, the version is also updated.
- All read operations are protected by version. If the version is changed, the read operation is repeated on the component level.
All write locks of pages are stored in separate HashMap during the atomic operation (transaction) duration. Every page change is logged before it is applied. All pages are locked by calling tryLock, and the Wait-For graph is built for all such calls. If tryLock is failed, then a check of the wait-for graph is performed. Timeout for tryLock has to be provided as a setting, and the default value is 100 us.
Validation of read operation:
- Load the page and store its current version in the list of versions and pages visited during the current atomic operation. During usage in component operation, the page is protected from eviction, so we can store both cache entry and its version as a pair, which will allow us to avoid repeating cache lookup later.
- Every read of the data unit from the page (primitive type, array) causes a version check after the read. If the read fails, the operation is repeated. (such a check is expected to be very quick because, for the duration of handling of the page, page versions are likely will be cached in the CPU L1 cache).
- At the end of the component operation, pairs (cache entry, version) are iterated, and page versions are rechecked to ensure read operation consistency. Once the check is passed, the operation is successfully returned.
There is an additional version check during jumps between pages. After loading the page pointed by the previous page, the version of the previous page is checked after the page is loaded to avoid processing of pages loaded by stale pointers.
We could lock a single page for reading during its handling to avoid constant checking of versions, but that will introduce a membar that I would like to avoid as much as possible. So we can try both ways and choose the best.
It is worth noting that if the database component is implemented as a batch of several data structures, all pages from them should be validated as a single unit.