glommio icon indicating copy to clipboard operation
glommio copied to clipboard

Eventfd not closed after executor finish

Open thirstycrow opened this issue 3 years ago • 9 comments

As the following benchmark runs, there's an increasing number of eventfds listed by lsof. With 2 new executors created in each round of test, there's 12 more open eventfds exists after the executors finish.

use glommio::channels::shared_channel;
use glommio::prelude::*;
use std::sync::mpsc::sync_channel;
use std::time::{Duration, Instant};

fn test_spsc(capacity: usize) {
    let runs: u32 = 10_000_000;
    let (sender, receiver) = shared_channel::new_bounded(capacity);

    let sender = LocalExecutorBuilder::new()
        .pin_to_cpu(0)
        .spawn(move || async move {
            let sender = sender.connect().await;
            for _ in 0..runs {
                sender.send(1).await.unwrap();
            }
            drop(sender);
        })
        .unwrap();

    let receiver = LocalExecutorBuilder::new()
        .pin_to_cpu(1)
        .spawn(move || async move {
            let receiver = receiver.connect().await;
            for _ in 0..runs {
                receiver.recv().await.unwrap();
            }
        })
        .unwrap();

    sender.join().unwrap();
    receiver.join().unwrap();
}

fn main() {
    for i in 0..10000 {
        println!("==========");
        println!("Round {}", i);
        //test_spsc(10);
        test_spsc(100);
        test_spsc(1000);
        test_spsc(10000);
    }
}

thirstycrow avatar Oct 20 '21 13:10 thirstycrow

Thanks @thirstycrow . I reproduced this, and I will hunt where this is coming from. Leave this to me. To set expectations, I am about to enter paternity leave so I'll be off for some days.

I suggest we manually raise the limit of file descriptors to a very high number so you can test your PR, and I'll fix this later.

glommer avatar Oct 20 '21 16:10 glommer

Ok, I know why this happens. We keep a clone of the sleep notifier inside task, and there is a problem that we are aware of for a long time now, but has been a minor bother: tasks that are not runnable do not have their destructors run when the executor drops. So that reference count never drops.

glommer avatar Oct 20 '21 16:10 glommer

I raised the limit of open files, and the test lasts until round 2723, panicked with Cannot allocate memory (os error 12). I inspected the process status just before the panic. The VSZ and RSS from the ps output are 36.8G and 1.765G. I have 32G memory on my laptop.

thirstycrow avatar Oct 21 '21 06:10 thirstycrow

As a status update, I spent some time trying to fix this, but it is really hard because tasks often get destroyed under our nose. This brought me back to the refcount hell in the task structures. I'll keep looking at it.

glommer avatar Nov 15 '21 20:11 glommer