sto icon indicating copy to clipboard operation
sto copied to clipboard

Add transactional ART

Open zyedidia opened this issue 5 years ago • 0 comments

Here's a PR for the work I did on making a transactional ART. I hope the info below is helpful. I'll leave it to you to merge this code/portions of the code however you would like.

Main modifications:

New ART/ directory Class art_index in DB_index.hh unit-artindex.cc test in test/

How to use

If you'd like to test it I believe that the YCSB_bench.hh file currently sets the ART as the data structure to test.

The art_index defined and implemented in DB_index.hh provides the same interface to the transactional ART that ordered_index does for the transactional masstree.

In other words, art_index implements:

  • select_row
  • insert_row
  • update_row
  • delete_row
  • nontrans_get
  • nontrans_put

All these functions are implemented for values with column accesses as well.

Range scan has been implemented on a separate branch in zyedidia/sto and seems to have some issues in the TPCC benchmark regarding endianness. I think these issues have to do with me misunderstanding the specification rather than a bug. Feel free to take a look at the code there.

Implementation

There are two elements to the implementation: first the underlying ART implementation which is just an implementation of an ART with optimistic lock-coupling provided by the makers of the ART. This has been modified some to reveal some more information so that the second element which is the transactional wrapper can access some needed information like the children of non-leaf nodes, and whether or not a node was upgraded during an insert.

Underlying ART

The main ART lives in ART/. This code is modified from the code for the ART written by the authors of the original paper. The original source code can be found at: https://github.com/flode/ARTSynchronized. That github repository implements a serial ART, an optimistic lock coupling (OLC) ART, and a ROWEX locking ART. Our ART/ directory contains an adapted version of the OLC ART.

Our modifications consist mainly of exposing internal ART nodes in calls to insert, lookup, and delete, as well as adding a version number to every node in the ART.

The ART implementation stores TIDs (uint64_ts) using the Key type defined in Key.h. This is simply a 128-byte buffer which uses the heap if the key exceeds 128 bytes. The loadKey function must also be defined. This function takes a TID and an empty key and populates the key with the actual key for that TID. This means that every value must store its key as well.

I store values in art_index as internal_elems which contain the key field, storing an lcdf::Str object. See art_index.load_key which populates the passed in reference to an ART key with the lcdf::Str data.

Insert

The insert function has been modified to take a make_tid callback which will allocate a new user element element and return the TID for the ART to store. This function may not be called on every insert. It will not be called if the key already exists in the tree.

Insert returns 3 values as a tuple: (tid, parent, old). The tid value is the TID of the node at that key. If the key was just inserted, tid will correspond to the new tid. If the key already existed, the tid of the key is returned. Parent is the node struct corresponding to the parent of the inserted node. It is null if the key already existed before the insert. Old is an obsolete node that should be deleted. This is usually null but if the insert caused a node upgrade then that node should be deleted but some extra precautions need to be made.

Lookup

ART's lookup takes a standard ART key and returns the TID (if found, 0 otherwise), and the TID's parent node. If the key was not found then it returns the node at the furthest point in the traversal where it determined the key was not in the tree.

Remove

Remove takes an ART key and the TID to remove and returns a possible old node. This old node may be null but if it has a value then it was made obsolete during a shrink and should be deleted.

RCU

I removed all "Epoch" RCU calls in the code and replaced them with calls that return nodes to be deleted to the art_index to rcu delete them using STO's mechanism. I believe there may be still a little work to be done regarding RCU deletion on destruction of the entire ART.

ART index

I simply took the ordered_index data structure which wraps masstree and replaced the masstree calls with ART calls.

On top of modifying the existing ordered index structure to use the ART instead of masstree, some code has also been added to deal with node upgrading in the ART.

Tests

The art_index passes the unit tests in test/unit-dboindex.cc but those tests are not especially rigorous. I wrote some more tests targeting specific functionality in the ART (for example, making sure node upgrading aborts properly). These new tests are located in test/unit-artindex.cc. The tests shouldn't be implementation specific and the file also contains a masstree wrapper so the tests can be executed on masstree as well (though the tests specifically test issues that would occur in ART rather than masstree).

Benchmarks

The ART implementation outperforms masstree in some of the benchmarks I ran, including YCSB. The ART range scan isn't faster than masstree's however.

Extra stuff

A lot of time during the summer was spent working on the TART which can be found in datatype/TART.hh. This is an implementation of a transactional ART that I wrote from scratch but was unable to port to the interface required by the benchmarks (the interfaced defined by ordered_index). Once I had a more in-depth understanding of how a TART might be written and how Masstree worked, I was able to take ordered_index and modify the references to masstree to instead call to the modified underlying ART. That code is included in this PR but at this point is unused.

zyedidia avatar Dec 05 '18 23:12 zyedidia