concread LinCowCell - UAF crash under write/read contention

Versions:

concread 0.5.5
389-ds-base (various versions with the recent NDN cache rework)

Description: After upgrading concread 0.2.21 -> 0.5.5, 389-ds-base crashes with a seg-fault in NDN cache. (stack trace is in the attachments) stacktrace.txt

And it looks like that a use-after-free comes from from LinCowCell::commit(). During commit the writer thread holds only an ArcSwap guard (load()) to the previous generation, not a real Arc. That should mostly be okay as it's pretty safe hazard like pointer (debt).

But I think that scheme might have a corner case: it assumes no nested Arc-Swap operations on the same thread will exhaust the thread-local fast-debt slots while the writer is still in its critical section.

And I think Concread’s LinCowCell::commit does exactly this:

let rwguard = self.active.load(); -> occupies a debt slot.

rwguard.pin.store(Some(new_inner.clone())) -> inside ArcSwapOption::store the writer thread itself calls strategy.wait_for_readers, which (again) calls load() to get replacement guards while sweeping debts.

The fast debt table for that thread can be full pretty quickly (it is pretty small for our case). If that overflows at the wrong moment, the fallback path bumps the ref-count of the pointer after the “outer” store has already swapped it away. A very tight race then lets the pointer hit zero while a nested clone is still executing, giving the precisely-observed crash in the atomic fetch_add.

That is why the failure appears inside Option<Arc<…>>::clone – the raw pointer being ref-counted is already invalid.

It happens under the medium/high contention.

SO, if I understand correctly, we might misuse the load() function from ArcSwap.

I don't think it'll result in a huge performance drop (if any) if we'll use load_full() instead here: https://github.com/kanidm/concread/blob/master/src/internals/lincowcell/mod.rs#L176-L200

New code (possible, - not tested):

fn commit(&self, write: LinCowCellWriteTxn<T, R, U>) {
    // Destructure our writer.
    let LinCowCellWriteTxn {
        // This is self.
        caller: _caller,
        mut guard,
        work,
    } = write;

    // Get the previous generation.
    let old_arc = self.active.load_full();

    // Start to setup for the commit.
    let newdata   = guard.pre_commit(work, &old_arc.data);
    let new_inner = Arc::new(LinCowCellInner::new(newdata));

    // Link the old generation to the new one for linear drop ordering.
    old_arc.pin.store(Some(new_inner.clone()));

    // Publish the new generation.
    self.active.store(new_inner);

    // Drop our strong ref; old generation will be reclaimed only
    // after all readers are done.
    drop(old_arc);
}

May 22 '25 00:05 droideck

Given the arc-swap docs state:

This returns a proxy object allowing access to the thing held inside. However, there’s only limited amount of possible cheap proxies in existence for each thread ‒ if more are created, it falls back to equivalent of [load_full](https://docs.rs/arc-swap/1.6.0/arc_swap/struct.ArcSwapAny.html#method.load_full) internally.

I suspect this is an issue in ArcSwap. I have no issue using load_full though, especially if it's "correct" in our case.

May 26 '25 12:05 Firstyear

It looks like the readtxn already uses load_full(), so I think this is okay as a change to make :)

May 26 '25 12:05 Firstyear