zero
zero copied to clipboard
Not latching cb when pinning in page_cleaner (a.k.a. finally figure out pin/unpin!)
I guess that in order to call pin()/unpin() in a control block, the latch must be held, otherwise there is no synchronization with other parts of the code that manually set the pin_cnt of a control block. I have tried to replace pin()/unpin() for latch_acquire()/latch_release() but I was getting some weird runtime error related with lock_elision, maybe this is an architecture issue.
Anyway, maybe a proper implementation of the buffer manager can ignore the pin()/unpin() protocol and rely only on shared latches (rwlock, for example).
void page_cleaner_base::flush_workspace(size_t from, size_t to)
{
if (from - to == 0) {
return;
}
// Flush log to guarantee WAL property
W_COERCE(smlevel_0::log->flush(_clean_lsn));
W_COERCE(smlevel_0::vol->write_many_pages(
_workspace[from].pid, &(_workspace[from]), to - from));
for (size_t i = from; i < to; ++i) {
bf_idx idx = _workspace_cb_indexes[i];
bf_tree_cb_t &cb = _bufferpool->get_cb(idx);
// Assertion below may fail for decoupled cleaner, and it's OK
// w_assert1(i == from || _workspace[i].pid == _workspace[i - 1].pid + 1);
cb.pin();
if (cb._pid == _workspace[i].pid && cb.get_clean_lsn() < _clean_lsn) {
cb.set_clean_lsn(_clean_lsn);
}
cb.unpin();
}
}
I think pin/unpin can be completely removed because it's basically useless -- we can do everything with cb.latch() and cb._used.
The idea of pinning is to perform certain operations that do not race as long as the frame is not replaced (e.g., by eviction). To avoid that, these operations simply "pin" the frame -- which basically increments the pin counter atomically -- and go on without further synchronization; the frame may only be removed with a CAS setting the pin counter from 0 to -1. This "should" somehow be cheaper than using the latch, but it isn't.
A latch is also basically an atomic counter: acquiring it in shared mode increments it, but only if there are no writers. Thus, whatever you would do after pinning can be done by acquiring a shared latch. That also guarantees that the frame will not be replaced, because eviction must acquire an exclusive latch.
Of course pinning allows for more concurrency because certain control block updates can be performed in parallel with writes on the page. But in the end, I don't think this brings any visible improvement in practice, and it's just not worth the hassle.
My recommendation: remove all code related to pin, replace it with latches and the _used flag, see if everything still works fine, and be happy.
As for why you get "weird runtime errors" when acquiring the latch, that's a completely different issue.
One thing I forgot: pinning is (or was?) also used in swizzling to avoid replacing frames that still have swizzled pointers in them. In this case, it would double as a counter of swizzled pointers. But again, I think a simple swizzle_counter, protected by the latch, would be enough. Currently, this does not matter because only leaf pages are evicted.
There is something else I forgot about pinning, which is actually very important: pinning is used for B-tree scans (see class btcursor_t).
A scan is opened on a certain key an the scan object holds a reference to the leaf node where that key was found. The reference is actually the index of the control block in the buffer pool. When next() is called, we need to move to the next record in that same leaf page without performing another traversal. This is where the pin counter comes in: it guarantees that the page would not have vanished or been replaced by another page when we go back to that page. Because we assume that it may take a long time between two consecutive next() calls (god knows what the application is doing in that time), we don't want to hold a latch during the whole time the scan is using that page.
If the page gets replaced in-between next() calls, that would simply not be detected. Thus, the scan would "fail silently" and simply return a completely unexpected record, which may either cause the scan to terminate earlier (because the bytes of the unexpected key happen to satisfy the scan stop condition) or, worse, to return garbage to the application. Given that our benchmarks are completely synthetic and we never really looked into what data is being returned, this could be happening in Zero/Zapps and we just don't know.
So, what's the solution? For now, leave the pin counter in -- instead of removing it like I suggested above. However, because we don't want to force exclusive latch for that, the pin counter must be an std::atomic -- otherwise multiple threads holding the shared latch will race when incrementing or decrementing it. Now, if we make the counter atomic, do we still need to latch before pinning or unpinning? Probably not, which means that the original design was right all along and I was wrong. Live & learn...