glommio
glommio copied to clipboard
Eventfd not closed after executor finish
As the following benchmark runs, there's an increasing number of eventfds listed by lsof
. With 2 new executors created in each round of test, there's 12 more open eventfds exists after the executors finish.
use glommio::channels::shared_channel;
use glommio::prelude::*;
use std::sync::mpsc::sync_channel;
use std::time::{Duration, Instant};
fn test_spsc(capacity: usize) {
let runs: u32 = 10_000_000;
let (sender, receiver) = shared_channel::new_bounded(capacity);
let sender = LocalExecutorBuilder::new()
.pin_to_cpu(0)
.spawn(move || async move {
let sender = sender.connect().await;
for _ in 0..runs {
sender.send(1).await.unwrap();
}
drop(sender);
})
.unwrap();
let receiver = LocalExecutorBuilder::new()
.pin_to_cpu(1)
.spawn(move || async move {
let receiver = receiver.connect().await;
for _ in 0..runs {
receiver.recv().await.unwrap();
}
})
.unwrap();
sender.join().unwrap();
receiver.join().unwrap();
}
fn main() {
for i in 0..10000 {
println!("==========");
println!("Round {}", i);
//test_spsc(10);
test_spsc(100);
test_spsc(1000);
test_spsc(10000);
}
}
Thanks @thirstycrow . I reproduced this, and I will hunt where this is coming from. Leave this to me. To set expectations, I am about to enter paternity leave so I'll be off for some days.
I suggest we manually raise the limit of file descriptors to a very high number so you can test your PR, and I'll fix this later.
Ok, I know why this happens. We keep a clone of the sleep notifier inside task, and there is a problem that we are aware of for a long time now, but has been a minor bother: tasks that are not runnable do not have their destructors run when the executor drops. So that reference count never drops.
I raised the limit of open files, and the test lasts until round 2723, panicked with Cannot allocate memory (os error 12)
. I inspected the process status just before the panic. The VSZ and RSS from the ps
output are 36.8G and 1.765G. I have 32G memory on my laptop.
As a status update, I spent some time trying to fix this, but it is really hard because tasks often get destroyed under our nose. This brought me back to the refcount hell in the task structures. I'll keep looking at it.