tokio icon indicating copy to clipboard operation
tokio copied to clipboard

Tasks should drop the waker when they complete

Open Darksonn opened this issue 2 years ago • 4 comments

Currently the JoinHandle waker is stored in the task struct until every ref-count to the task is dropped. Therefore, you can have a ref-cycle if two tasks await each other's JoinHandle. For example, this leaks:

use tokio::task::JoinHandle;
fn main() {
    let rt = tokio::runtime::Builder::new_current_thread().build().unwrap();
    
    let (send, recv) = tokio::sync::oneshot::channel::<JoinHandle<()>>();
    
    let jh = rt.spawn(async move {
        let jh2 = recv.await.unwrap();
        jh2.await.unwrap();
    });
    
    let jh2 = rt.spawn(async move {
        jh.await.unwrap();
    });
    send.send(jh2).unwrap();
    
    rt.block_on(async {
        for _ in 0..10 {
            tokio::task::yield_now().await;
        }
    });
}

Darksonn avatar May 29 '22 18:05 Darksonn

Here's a proposal for fixing this:

The JoinHandle waker

There are two bits that control access to the JoinHandle waker field: JOIN_WAKER and JOIN_INTEREST. The rules are as follows:

  • If JOIN_INTEREST is one, then the JoinHandle may read from the waker field.

  • If the JoinHandle set the JOIN_WAKER field to zero, then the JoinHandle may mutate the waker field.

  • If the runtime set the JOIN_WAKER field to zero and JOIN_INTEREST is one, then the runtime may read from the waker field.

  • If the runtime set the JOIN_WAKER field to zero and JOIN_INTEREST is zero, then the runtime may mutate the waker field.

  • The JOIN_WAKER field starts out as a zero. The initial zero counts as having been set by the JoinHandle.

The runtime may only set the JOIN_WAKER field to zero when COMPLETE is one. This ensures that if the JoinHandle ever sees both JOIN_WAKER and COMPLETE being zero, then the JoinHandle knows that it may mutate the waker field.

To show that the above never results in any race conditions, we consider the two cases where the waker field may be modified:

  • If the JoinHandle mutates the waker, then the JoinHandle has set JOIN_WAKER to zero. The runtime never accesses the field unless the runtime is the one who set it to zero, so this ensures that the runtime will not race on the field.
  • If the runtime mutates the waker, then the JOIN_INTEREST field is zero and the runtime is the one who set JOIN_WAKER to zero. However, the JoinHandle can only access the field if JOIN_INTEREST is one, or if the JoinHandle set the JOIN_WAKER field to zero. Therefore, the JoinHandle cannot race on the waker field.

Normally, the waker is destroyed by the JoinHandle when the JoinHandle is destroyed. The JoinHandle does this by setting JOIN_WAKER and JOIN_INTEREST to zero, and then destroying the waker. However, if JOIN_WAKER is already zero when this happens, then the JoinHandle does not perform any cleanup of the waker. This situation only happens if no waker is set, or if the runtime is concurrently waking the waker.

When the runtime wakes the JoinHandle, it does this by first setting JOIN_WAKER to zero to obtain access to the waker field. If JOIN_WAKER was already zero, then nothing happens. Once the runtime has woken the waker, it tries to set the JOIN_WAKER bit back to true, but in a way that fails if JOIN_INTEREST is zero. In that case, the runtime now knows it can mutate the waker and the runtime will destroy it.

Darksonn avatar May 30 '22 19:05 Darksonn

@Darksonn I would be interested in having a go at this issue.

I'm slowly getting my head around what currently exists and how that relates to the proposed solution.

My understanding is that the new part of this proposal is only the last sentence. That when the runtime wakes the waker and tries to set the JOIN_WAKER flag back to 1, but finds that JOIN_INTEREST is 0, it will destroy the waker on the task. And that this would all happen somewhere from RawTask::try_read_output down to can_read_output in the harness module. It looks to me like everything else described here is already implemented.

Am I a long way off?

hds avatar Jul 29 '22 16:07 hds

I already have a 75% complete implementation of this.

Darksonn avatar Jul 29 '22 19:07 Darksonn

Oh cool. I'd be interested in seeing it when it's done. I feel I might understand some of it.

hds avatar Aug 01 '22 07:08 hds