sto
sto copied to clipboard
Add transactional ART
Here's a PR for the work I did on making a transactional ART. I hope the info below is helpful. I'll leave it to you to merge this code/portions of the code however you would like.
Main modifications:
New ART/
directory
Class art_index
in DB_index.hh
unit-artindex.cc
test in test/
How to use
If you'd like to test it I believe that the YCSB_bench.hh
file currently sets the ART as the data structure to test.
The art_index
defined and implemented in DB_index.hh
provides the same interface
to the transactional ART that ordered_index
does for the transactional masstree.
In other words, art_index
implements:
-
select_row
-
insert_row
-
update_row
-
delete_row
-
nontrans_get
-
nontrans_put
All these functions are implemented for values with column accesses as well.
Range scan has been implemented on a separate branch in zyedidia/sto and seems to have some issues in the TPCC benchmark regarding endianness. I think these issues have to do with me misunderstanding the specification rather than a bug. Feel free to take a look at the code there.
Implementation
There are two elements to the implementation: first the underlying ART implementation which is just an implementation of an ART with optimistic lock-coupling provided by the makers of the ART. This has been modified some to reveal some more information so that the second element which is the transactional wrapper can access some needed information like the children of non-leaf nodes, and whether or not a node was upgraded during an insert.
Underlying ART
The main ART lives in ART/
. This code is modified from the code for the ART written
by the authors of the original paper. The original source code can be found at:
https://github.com/flode/ARTSynchronized. That github repository implements a
serial ART, an optimistic lock coupling (OLC) ART, and a ROWEX locking ART. Our ART/
directory contains an adapted version of the OLC ART.
Our modifications consist mainly of exposing internal ART nodes in calls to insert, lookup, and delete, as well as adding a version number to every node in the ART.
The ART implementation stores TIDs (uint64_t
s) using the Key
type defined in Key.h
.
This is simply a 128-byte buffer which uses the heap if the key exceeds 128 bytes.
The loadKey
function must also be defined. This function takes a TID and an empty
key and populates the key with the actual key for that TID. This means that every
value must store its key as well.
I store values in art_index
as internal_elem
s which contain the key
field, storing
an lcdf::Str
object. See art_index.load_key
which populates the passed in reference
to an ART key with the lcdf::Str
data.
Insert
The insert
function has been modified to take a make_tid
callback which will allocate
a new user element element and return the TID for the ART to store. This function may
not be called on every insert. It will not be called if the key already exists in the
tree.
Insert returns 3 values as a tuple: (tid, parent, old). The tid value is the TID of the node at that key. If the key was just inserted, tid will correspond to the new tid. If the key already existed, the tid of the key is returned. Parent is the node struct corresponding to the parent of the inserted node. It is null if the key already existed before the insert. Old is an obsolete node that should be deleted. This is usually null but if the insert caused a node upgrade then that node should be deleted but some extra precautions need to be made.
Lookup
ART's lookup takes a standard ART key and returns the TID (if found, 0 otherwise), and the TID's parent node. If the key was not found then it returns the node at the furthest point in the traversal where it determined the key was not in the tree.
Remove
Remove takes an ART key and the TID to remove and returns a possible old node. This old node may be null but if it has a value then it was made obsolete during a shrink and should be deleted.
RCU
I removed all "Epoch" RCU calls in the code and replaced them with calls that return
nodes to be deleted to the art_index
to rcu delete them using STO's mechanism. I
believe there may be still a little work to be done regarding RCU deletion on destruction
of the entire ART.
ART index
I simply took the ordered_index
data structure which wraps masstree and replaced
the masstree calls with ART calls.
On top of modifying the existing ordered index structure to use the ART instead of masstree, some code has also been added to deal with node upgrading in the ART.
Tests
The art_index
passes the unit tests in test/unit-dboindex.cc
but those tests
are not especially rigorous. I wrote some more tests targeting specific functionality
in the ART (for example, making sure node upgrading aborts properly). These new tests
are located in test/unit-artindex.cc
. The tests shouldn't be implementation specific
and the file also contains a masstree wrapper so the tests can be executed on masstree
as well (though the tests specifically test issues that would occur in ART rather than
masstree).
Benchmarks
The ART implementation outperforms masstree in some of the benchmarks I ran, including YCSB. The ART range scan isn't faster than masstree's however.
Extra stuff
A lot of time during the summer was spent working on the TART
which can be found
in datatype/TART.hh
. This is an implementation of a transactional ART that I wrote
from scratch but was unable to port to the interface required by the benchmarks
(the interfaced defined by ordered_index
). Once I had a more in-depth understanding
of how a TART might be written and how Masstree worked, I was able to take
ordered_index
and modify the references to masstree to instead call to the modified underlying
ART. That code is included in this PR but at this point is unused.