foundationdb `loop-choose-when` can sometimes cause priority inversion

In certain cases, a loop-choose-when block can introduce priority inversion.

For example in the following case:

loop choose {
    when(T value1 = wait(FutureStream<T>f1)) {
        wait(someLock->take());
        // do awesome stuff
    }
    when(T value2 = wait(FutureStream<T>f2)) {
        // do even more awesome stuff
    }
}

Let's say the f1's task priority is lower than f2. There could be an execution path which causes priority inversion. Consider the following execution:

At time t_0, because none of f1 or f2 is ready yet, the program simply registers a callback for each of them and yield the control flow;
At time t_1, f1 is ready and thus callback for it got fired. The program first removes all other callbacks for this actor then try to grab the lock.
If the lock is not ready, the program registers a callback for the lock and then the control flow will be yield from there. Notice as of now, the only callback registered is for the lock.
Say after that, the f2 is ready on the run loop before the next f1. But since we don't have a callback here, the only way to know that would be examining if the future is ready by calling f1.isReady().
Now if the program resumes the execution when the lock is ready, it will continue executing the do awesome stuff. After that the program returns to the head of the loop. If f1 is ready during the wait on the lock, even it came after the f2, the program will simply execute its block, and thus repeats from step 2 - 6.

The aforementioned bug actually boils down to:

A when statement with higher delaying order waiting for a low priority run loop task
AND that when block contains another wait.

So the users of Flow need to pay special attention when they decided to use a wait inside the non-tail when block in a loop-choose-when.

I didn't see documentations/issues/questions regarding to this and thus want to open this tracking issue for both visibility and possible future solution.

Apr 27 '19 05:04 dongxinEric

Maybe we can at least emit a warning (console warning message or an actual complier warning or both) from ActorCompiler if it sees this pattern.

Apr 27 '19 05:04 dongxinEric

What is the definition of priority in the above example? Does it mean that wait in the when condition should have higher priority than the wait inside the when statement?

If we wrap the first when into an actor and use an ActorCollector to collect the actor, will the problem be solved?

Apr 27 '19 17:04 xumengpanda

What is the definition of priority in the above example? Does it mean that wait in the when condition should have higher priority than the wait inside the when statement?

By priority I mean the Task Priority on the run loop. In the scenario described above, the task with a higher priority on the run loop becomes ready before the lower priority task. But actors waiting on it won't be notified. And hence the priority inversion.

Notice this behavior does not break the contract of the choose-when structure, which is the reason why it's not a bug, but rather just an anti pattern.

If we wrap the first when into an actor and use an ActorCollector to collect the actor, will the problem be solved?

What is an ActorCollector ?

Apr 27 '19 17:04 dongxinEric

@xis19 added a linter to detect this code pattern in https://github.com/apple/foundationdb/pull/9800

I did an audit of the results. There are a few cases of this exact problems. The majority are delay(0) or wait before a break or return, which are fine.

Mar 27 '23 16:03 jzhou77

foundationdb foundationdb copied to clipboard

`loop-choose-when` can sometimes cause priority inversion

foundationdb
foundationdb copied to clipboard