cassandra
cassandra copied to clipboard
Initial commit of the PersistentMemoryMemtable
This initial commit is to get some feedback on the memtable implementation. The code was tested using 'ant test'. Some of the changes
- Code has a dependency on Low Level Persistence Library which has been added in the lib directory.
- Some modifications were made to build.xml to accommodate persistent memory testing
- Eclipse-warning has been temporarily turned off because it required premature closing of an iterator used in the code which we are looking into
This looks pretty nice so far — my main concern is around concurrent read safety and is probably answered by how
TransactionalHeap
works. The concern is that readers may hold references to some data blocks for a long time (e.g. if they are performing a big range query) and we must make sure that the data is not modified under them. The code does seem to do this for some blocks, and I wonder if and how the heap can guarantee e.g. that if a reader is sent to sleep in the middle of deserializing a row, it won't wake up to continue with data in the block that is from a different version of the row and where their current processing position does not make sense and causes a segmentation fault or similar.
@blambov PersistentMemoryMemTable internally uses the following data structures to persist data
• Concurrent Adaptive Radix Tree (CART), the data structure which holds reference to tables, is thread safe. It is comprised of multiple single threaded shards • Adaptive Radix Tree (ART), data structure which refers to partitions, is not thread-safe
Multiple gets on the CART for same partition, returns different Java objects even though they point to the same data on persistent memory. This makes synchronization of partitions tricky as different threads would have different Java objects and potentially different locks In order to get around this, we use shard locks to protect the partitions i.e. we hold the lock on the shard until R/W operations on the partition has completed. This is a coarse grained initial solution to address this problem, we are looking at alternatives.
To answer your question, a reader who accesses a partition for a long time has exclusive access to that partition and will block all other threads from accessing that partition as well as any other partition in that shard.