Seqlock
Seqlock copied to clipboard
Why no fetch_add ?
Instead of 2 stores, why don't you fetch_add(1)
twice ?
Also, the initial code would break if 2 writers were to write at the same time since you don't check the result of the first load.
Typically, to be more safe, the (pseudo) code should be
write(const T & value)
{
int seq = counter.load(relaxed);
while (seq.CASweak(seq, seq+1, acq_rel));
if (seq & 1)
{ // Need to wait with a CAS until it's even again }
copy
counter.fetch_add(1, release);
}
But doing so make it likely much slower.
Actually c++11 atomic is an overkill, fetch_add is an overkill, and even normal compiler memory fence is an overkill, if you read Chinese you can check this post for the reason.
So IMO, the most concise and efficient Seqlock implementation could look like this:
template<typename T>
class Seqlock {
public:
T load() const {
T copy;
std::size_t seq0, seq1;
do {
asm volatile("" : "=m"(seq_) : :); // force read memory
seq0 = seq_;
asm volatile("" : "=m"(seq_), "=m"(value_) : :); // memory fence
copy = value_;
asm volatile("" : "=m"(seq_), "=m"(value_) : :); // memory fence
seq1 = seq_;
} while(seq0 != seq1 || seq0 & 1);
return copy;
}
void store(const T& desired) {
seq_++;
asm volatile("" : : "m"(seq_), "m"(value_) :); // memory fence
value_ = desired;
asm volatile("" : : "m"(seq_), "m"(value_) :); // memory fence
seq_++;
asm volatile("" : : "m"(seq_)); // force write memory
}
private:
// actually there's no need to align with cacheline size as seq and value are always updated together
std::size_t seq_ = 0;
T value_;
}
@X-Ryl669
- That will work, but it as a stronger operation than needed and will be slower.
- Only a single writer is supported.
@MengRao
You're solution will work on TSO memory model, it will not work for ARM. Even for amd64 your solution and mine will not work if the assignment operator uses non-temporal stores or string instructions. See Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3 (3A, 3B, 3C & 3D): System Programming Guide section 8.2.2
I think the generic solution is to require T to be trivially copyable and use a version of memcpy
that is guaranteed to not use non-temporal stores.
In any case it is currently impossible to implement seqlocks under C++11 memory model in a way that is portable to all CPU architectures.
The goal of this project was to try and express this concurrency primitive in portable C++11, conclusion is that it's not possible.
@MengRao It's also needed to align and pad the seqlock to avoid false sharing. If you have array/vector of seqlocks with multiple writer threads for different seqlocks you need to avoid false sharing.
@MengRao It's also needed to align and pad the seqlock to avoid false sharing. If you have array/vector of seqlocks with multiple writer threads for different seqlocks you need to avoid false sharing.
You're right, so putting an "alignas(64/128)" before the first member variable is enough, and the padding is automatic(you can check sizeof(Seqlock)).
@rigtorp It's portable, as long as you're using atomic increment, CAS or, at least, swap. But doing so, you'll see that there's no need to increment anything twice, and you can do as any other spinlock:
struct Lock
{
atomic<bool> l;
void acquire() { while (l.swap(true)); }
void release() { l.store(false, release); }
};
Since reading is not wait-free, the only advantage of seqlock is to have wait free writing. Provided that only a single writer is allowed, you can replace this with a spinlock that's only taken by the writer and another one that's taken by both (R/W). Typically the scheme is: For reading:
- Increment reader atomic count
- If writer lock is free 2.1 read the structure 2.2 Decrement reader atomic count 2.3 Done
- Else, spin on the lock (via CAS), goto 2
For writer:
- Take the writer lock
- Check reader counter (atomic load)
- If non 0, spin on the the reader counter until it's 0
- Modify protected object
- Release writer lock
Point 3 is not wait-free, but fortunately, it's short so it should be ok.
@MengRao It's also needed to align and pad the seqlock to avoid false sharing. If you have array/vector of seqlocks with multiple writer threads for different seqlocks you need to avoid false sharing.
You're right, so putting an "alignas(64/128)" before the first member variable is enough, and the padding is automatic(you can check sizeof(Seqlock)).
Yes but support for alignas with over alignment (>8 bytes on x86) is a very recent addition to compilers (required by C++17 i think). On old compilers it will just be ignored.
@X-Ryl669 Yes it's possible to implement other types of locks, but then it's no longer a seqlock. The whole point here was to implement seqlocks.
And the advantage of a seqlock over e.g. a reader/writer lock or the lock design above is that with the seqlock, readers don't have to modify some shared location which would otherwise create a scalability bottleneck. This is a very useful property when you have shared data structures that are often read (by many threads) and seldom written. Networking has many such data structures.
Even for amd64 your solution and mine will not work if the assignment operator uses non-temporal stores or string instructions.
will this implementation work on x86_64 or only on x86 ??
will this implementation work on x86_64 or only on x86 ?? Both
@rigtorp
In any case it is currently impossible to implement seqlocks under C++11 memory model in a way that is portable to all CPU architectures.
Just to be clear you said that, meaning, "... (without having to submit a stronger fence like seq_cst
)", because that'd emit a stronger (and suboptimal) fence in x86 like lock
prefix, right?