Xline icon indicating copy to clipboard operation
Xline copied to clipboard

TXN optimization

Open themanforfree opened this issue 1 year ago • 2 comments

The following is the benchmark data of xline and etcd process txn put requests

benchmark command:

benchmark --clients=${clients} txn-put --key-size=8 --val-size=256 --total=${total} --key-space-size=${key-space-size}

Xline

QPS Latency(ms) Time(s) clients total key-space-size
324.3 3.1 6.2 1 2000 1
2768.4 3.6 3.6 10 10000 1
5429.5 9.2 3.7 50 20000 1
5498.6 18.1 3.6 100 20000 1
5485.5 36.3 5.5 200 30000 1
331.5 3.0 6.0 1 2000 100000
2700.4 3.7 3.7 10 10000 100000
5470.5 9.1 3.7 50 20000 100000
5563.0 17.9 3.6 100 20000 100000
5530.4 36.1 5.4 200 30000 100000

etcd

QPS Latency(ms) Time(s) clients total key-space-size
935.9 1.1 2.1 1 2000 1
3341.5 3.0 3.0 10 10000 1
12219.3 4.1 1.6 50 20000 1
25641.9 3.9 0.8 100 20000 1
16737.8 11.9 1.8 200 30000 1
812.7 1.2 2.5 1 2000 100000
4034.8 2.5 2.5 10 10000 100000
13525.4 3.7 1.5 50 20000 100000
24238.7 4.1 0.8 100 20000 100000
30863.9 6.5 1.0 200 30000 100000

In our current implementation of Txn, there are some optimized parts, as follows:

  • [ ] Read-only Txn requests do not need to go through the consensus protocol, we can optimize read-only requests through the ReadState mechanism
  • [ ] Provide concurrent read and write capabilities at the storage layer

themanforfree avatar Jun 13 '23 08:06 themanforfree

We have identified the bottleneck for performance degradation in Xline TXN by profiling with perf. Here are the specific details from perf:

img_v2_bb5fcb39-dde1-46da-b7a1-ac213b742aeg

As you can see, the Index::mark_available is the bottleneck. We have tried to remove this mark_avaliable method and do the benchmark again. All the tests listed below use the same benchmark command, ./benchmark put 10 100000, which means using 10 clients to put keys. The key space size is 100K.`

engine remove mark_available total slowest fastest avg rps 99%
memory no 73.6666 s 71.5 ms 2.1 ms 7.4 ms 1357.4677 13.4 ms
memory yes 56.5029 s 77.8 ms 2.3 ms 5.6 ms 1769.8196 8.7 ms
rocksdb no 295.0114 s 83.1 ms 6.7 ms 29.5 ms 338.9699 49.7 ms
rocksdb yes 59.3128 98.2 ms 2.3 ms 5.9 ms 1685.9769 9.2 ms

Phoenix500526 avatar Jul 17 '23 02:07 Phoenix500526

Current Problem

Currently, Xline uses a version-based storage schema to achieve the multi-version storage feature for key-value pairs. During the actual storage of a key-value pair (the process of sync_put_request), the following two steps are involved:

  1. Execute Index::insert_or_update_revision to generate a KeyRevision for the key and insert it into the Index.
  2. Convert the KeyRevision from step 1 into a Revision and store the value with the Revision as the key.

When getting a key-value pair (the process of handle_range_request), the corresponding KeyRevision needs to be read from the Index before accessing the database.

Since Xline does not implement real MVCC transactions, there is no atomicity guarantee between steps 1 and 2. This is not an issue for write requests such as PutRequest or DeleteRangeRequest because CURP's conflict detection mechanism can replace transaction functionality, ensuring that different write requests do not cause concurrency issues.

However, since RangeRequests do not go through the CURP protocol, it leads to the following problem: when handle_range_request starts executing before step 2 is completed in sync_put_request, it will be able to find the corresponding KeyRevision in the index, but the corresponding KeyValue cannot be found in the database. (Note: this problem only occurs in serial reads, and there is no such issue in linear reads).

To avoid this situation, the original approach was to add an "available" field in KeyRevision, and it is only marked as available after the corresponding KeyValue is successfully written to disk. However, this approach has performance issues as the implementation of mark_available involves complex traversal and lookup operations, leading to significant lock contention.

Possible Solutions

There are two solutions to solve this problem.

Solution One: Remove "available" and implement a compensation logic in the RangeRequest handling process

Specific steps:

  1. Remove all implementations related to "available" in the Index, including mark_available, to eliminate expensive contention operations.
  2. Implement a check in the Range request: if a Revision is read from the index, but the corresponding key value cannot be found in the database, return the old value.

Pros

Simple modification and significant performance improvement without changing the premise of a lock-free implementation.

Cons

  1. Poor maintainability
    • The Range request has various constraints such as OneKey, AllKeys, and Range, making the judgment logic complex.
    • One of the invariants of the underlying database is that the number of Revisions read from the index is always equal to the number of KeyValue read from the database. This solution breaks this invariant. In the future, if there is an error causing the number of Revisions and KeyValues to be inconsistent, it will be challenging to tell between errors and compensation logic.
    • It adds cognitive burden as the problem arises from sync_put_request but the resolution logic is in handle_range_request. Future modifications in these areas need to be handled with great care to avoid inadvertently introducing obscure bugs.

Solution Two: Change the operation logic

Change the storage operation logic from updating the Index first and then flushing to the disk to first successfully flushing to the disk and then updating the Index.

Specific steps:

  1. Change the Index lock to RwLock and split insert_and_update_revision in the Index into register_revision (for calculating the key to be flushed, read-only operation) and insert_revision (for updating the index, write-only operation).
  2. Modify the logic of sync_put_request: first use register_revision to register the corresponding key, then generate the corresponding ops to be executed by the CommandExecutor (CE).
  3. The CE executes flush_ops. If flush_ops contains a PutKeyValue(Revision, KeyValue) operation, it calculates the corresponding KeyRevision based on the Revision and KeyValue and inserts it into the Index.

Pros

  1. After changing Index to RwLock, the concurrency between multiple RangeRequests is improved.
  2. Easy to maintain, transparent to upper-level business logic, does not invade unrelated request processing flows, such as handle_range_request. It also does not need to break the invariant of the underlying database.
  3. Resolves the transaction problem between Index updates and flushing to the disk from the root cause and can be optimized with a lock-free solution.

Cons

  1. Extensive modifications, as it involves changes to some underlying processes and data structures, leading to changes cascading through the codebase.
  2. The performance improvement may not be as significant as Solution One because there is still lock contention. However, compared to the original solution, there will be improvements in two aspects:

Phoenix500526 avatar Jul 21 '23 10:07 Phoenix500526