swym
swym copied to clipboard
Lock Sorting
I'm interested in modifying the current implementation so that locks are acquired in order based on memory location. I'm trying to implement sorting within the epoch_locks() function call of write_log and read_log, and have a couple of questions:
-
Is this the correct place to implement sorting (given that I don't want to change the underlying data structure to a heap)? Are there additional calls/places that I need to sort to make sure global lock order is preserved?
-
How are epoch_locks stored within the "self.data" (in write_log for example) field that is then transformed to a flatmap? all I can see is an fvec with an additional data field, but am unsure how the data numbers translate to the entries and tcells that are returned and processed. Are these values the times of last write? or memory locations? how is the mapping done between these and the actual EpochLock structures returned?
I can understand why you might want to acquire locks in order, however, I strongly suspect it's only going to impact performance negatively without any additional guarantees. Transactions are guaranteed to commit after a certain amount of time due to an eventual fairness guarantee (effectively grabs a global lock). If you still wanna sort, read on!
The write log contains both the "new" pending value of the TCell
as well as a pointer to the original TCell. These are stored contiguously in memory. Meaning each element of the vec is a different size! This is all unsafe
rust, but the vtable of WriteEntry
lets us recover size and alignment information at runtime. Something like this IIRC.
| 8 bytes | 8 bytes | xxx bytes | 8 bytes | 8 bytes | yyy bytes |
| x vtable ptr | TCell ptr | x value | y vtable ptr | TCell ptr | y value |
This data structure doesn't really support random access, because you must read x vtable ptr
in order to know where the y vtable ptr
is in memory.
If you want to acquire the locks in sorted order, then I'd recommend another data structure for the write log, or a separate data structure containing pointers to the EpochLock
s that can be sorted and is kept in sync with the DynVecWriteEntry
.
Also, the code for the write log is really optimized (I hope anyway), and unfortunately, a fairly tricky "unsafe" part of the code base. Don't let that deter you tho, break stuff, and have fun.
Thanks! I'm stuck on a small issue where I have the locks in order in a vector or iterator (sorted using a priority queue), but I'm not sure how to use fvec and dyn_vec to return them properly. Is there any way to convert a standard vector to a dyn_vec so that I can return it from epoch_locks as an iterator over the dynamic vector? Basically how do I transform a `std::slice::Iter' into a 'internal::alloc::dyn_vec::Iter' in a way that preserves the order and integrity of the structs? I couldn't find a new() or push/pop functs for dynamic vector.
I'm still running into the same issue as above where I can't return a flatmap of a standard vector. You can see my attempted changes here for src/internal/write_log.rs epoch_locks() in a new branch: https://github.com/scaspin/swym/tree/lock_ordering
When trying to return the sorted map, I get an (expected) error on build that:
expected struct `internal::alloc::dyn_vec::Iter`, found struct `std::slice::Iter`
| ...is found to be `std::iter::FlatMap<internal::alloc::dyn_vec::Iter<'a, (dyn internal::write_log::WriteEntry + 'tcell)>, std::option::Option<&'a internal::epoch::EpochLock>, _>` here
|
Which I attempted to solve by:
- changing the underlying structure of the iter to be based on fvec/dyn_vec rather than a slice
- attempting to convert the vector to a vtable
- changing the return type of the function which gave the error:
doesn't have a size known at compile-time
which I attempted to fix multiple places but could not resolve
Any advice and tips would be appreciated since I am not too confident about my use of fvecs and vtables here. Thanks!
changing the return type of the function which gave the error: doesn't have a size known at compile-time which I attempted to fix multiple places but could not resolve
You'll definitely have to do this. Rust has some syntax for saying "I'm just returning an iterator".
fn blah() -> impl Iterator<Item = &'a EpochLock>
The problem is that the Item
type of a priority queue I assume, is some sort of pair instead of just a &'a EpochLock
.
I would suggest just collect
ing the flatmap result into a regular Vec
. Then do a sort_by
on the vec using ptr identity. Then you can return vec.into_iter()
.
That will at least get the feature working. From there you can try to optimize the iter -> to sorted iter, operation. Things to try might be collecting into a BTreeSet or other sorted container.
SmallVec
is another option to sidestep the heap for smaller transactions. I imagine that will get you a lot closer to the perf of no lock sorting. However, the sort operation itself is probably gonna hurt inlining substantially.
When the priority queue types are returned using ".to_sorted_vec()" only the locks are returned in order without priorities (that's why I translate to vector first and then iterator, since the ".to_iter()" functionality of the library returns an iterator over the tuples). The sorting works, and I don't really care about the costs at the moment.
The main issue I'm running into is returning the flatmap in a way that'll be accepted and returned correctly. The same issue happens when I convert to a vector using collect from the flatmap. Basically I have an iterator over the locks similar to this.data.iter()
but the underlying vector structure is a standard vector rather than an fvec/vtable. This is where the error message expected struct internal::alloc::dyn_vec::Iter`, found struct `std::slice::Iter
occurs again. Your suggestion for sorting after flat_map
yields the same error upon compilation. Any additional advice? or how do I construct a dyn_vec out of a standard vec?
Ended up bypassing this issue by just sorting on every call to epoch_locks(). Not a super elegant solution but it allowed me to not deal with the return statements in Epoch locks and actually sort the locks without decomposing the logs, and I think there were only ~4 calls to epoch_locks where order matters (acquisition and release), I didn't bother sorting for calls that merely wanted the lock count.
Ended up bypassing this issue by just sorting on every call to epoch_locks().
Ah. My mistake, I thought this is what you were going for. I don't think release order would matter fwiw.