actix Multi-arbiter systems leave lingering Arbiters behind after shutdown.

Expected Behavior

After System::run returns, all system's actors are stopped and dropped, all Arbiters are joined.

Current Behavior

If, inside the system, additional arbiters are created, they outlive the system.

Reproducible example:

#[test]
fn actix_sample() {
    System::run(|| {
        let arb = actix::Arbiter::new();
        A::start_in_arbiter(&arb, |_| A);
        std::thread::sleep_ms(100);
        System::current().stop();
    });
    eprintln!("B");
    std::thread::sleep_ms(300);
}

struct A;

impl Drop for A {
    fn drop(&mut self) {
        std::thread::sleep_ms(200);
        println!("A");
    }
}

impl actix::Actor for A {
    type Context = actix::Context<A>;
}

This prints B, A while it should print A, B. Otherwise, it's not really defined when the actor get's dropped and it might happen after main exits, skipping drops

Context

https://github.com/near/nearcore/issues/3925

Your Environment

Rust Version (I.e, output of rustc -V): rustc 1.49.0-nightly (91a79fb29 2020-10-07)
Actix Version: 0.11.0-beta.1

Feb 15 '21 17:02 matklad

Firstly, update to beta 2 to avoid a bug that was present regarding non-registered arbiters and to expose actix-rt v2 based API.

Arbiters are glorified threads and just like with threads you need to join them to await their destruction.

Therefore, the incantation to make these run in order is as follows:

#![allow(warnings)]

use actix::prelude::*;

#[test]
fn actix_sample() {
    let sys = System::new();
    let arb = actix::Arbiter::new();

    A::start_in_arbiter(&arb.handle(), |_| A);

    std::thread::sleep_ms(100);

    eprintln!("send sys stop");
    System::current().stop();

    eprintln!("run sys to completion");
    sys.run().unwrap();

    eprintln!("join arb");
    arb.join().unwrap();

    eprintln!("B");
    std::thread::sleep_ms(300);
}

struct A;

impl Drop for A {
    fn drop(&mut self) {
        std::thread::sleep_ms(200);
        println!("A");
    }
}

impl actix::Actor for A {
    type Context = actix::Context<A>;
}

Feb 15 '21 18:02 robjtede

Yeah, that works for the given example, but I am not feeling easy about this solution:

it still feels wrong that the Actor itself outlives the system it belongs to, regardless of the mechanism used to execute the actor's actions.
Semantics of Arbiter::join is really weird: Arbiter is Clone, and you can join any copy of, but only for the original instance join actually joins.

Feb 15 '21 18:02 matklad

Arbiter is no longer Clone since Handles now exist.

It would be possible to collect all Arbiter thread join handles and wait on them before System::run completes but it would limit the ability of downstream to choose how they're handled since they are not Clone in std.

This issue actually relates to actix-rt not to actix since they're re-exports. Maybe the solution is that in actix-land the threads are joined.

Feb 15 '21 18:02 robjtede

Do you have a use case in mind for wanting this kind of ordering guarantee?

As I see it, you can't do anything useful with a system after sending the stop message anyway. So, as long as your System is set up on the main thread you'll never have a dangling Arbiter (again, not that it seems to matter) and never, in any case, have a thread destroyed without calling it's destructor, if my reading and understanding of std::thread is correct.

Feb 15 '21 19:02 robjtede

I definitely do want actor's destructors to have been called when run returns, because destructors often enforce correctness critical invariant. IE, in NEAR's case, an actor manages a database, and Actor's Drop closes this db. It is pretty unfortunate to return from main without waiting for that to happen.

I don't think joining the thread itself is that important, as long as actors themselves are dropped.

Feb 15 '21 19:02 matklad

I'm 90% sure Rust will guarantee the destructors run on all Actors before the program terminates.

Feb 15 '21 20:02 robjtede

Here's the minimal repro demonstrating that it's not the case.

#![allow(warnings)]

use actix::prelude::*;

#[test]
fn actix_sample() {
    let sys = System::new();
    let arb = actix::Arbiter::new();
    A::start_in_arbiter(&arb.handle(), |_| A);
    System::current().stop();
    sys.run().unwrap();
    std::thread::sleep_ms(100);
}

struct A;

impl Drop for A {
    fn drop(&mut self) {
        println!("start drop");
        std::thread::sleep_ms(200);
        println!("finish drop");
    }
}

impl actix::Actor for A {
    type Context = actix::Context<A>;
}

leaked threads are terminated abruptly. In this case, the thread is terminated in the middle of Drop. It is a giant footgun and a reliability hazard, and people tend to learn about it the hard way (I published a blog post with a bug, lol: https://github.com/rust-lang/rust/issues/48820).

Feb 15 '21 20:02 matklad

#[test]
fn actix_sample() {
    actix::System::new().block_on(async {
        let arb = Arb(Some(actix::Arbiter::new()));
        let addr = A::start_in_arbiter(&arb.handle(), |_| A);
        std::thread::sleep_ms(100);
        actix::System::current().stop();
        actix_web::rt::task::yield_now().await;
    });
    eprintln!("B");
    std::thread::sleep_ms(300);
}

struct Arb(Option<actix::Arbiter>);

impl Drop for Arb {
    fn drop(&mut self) {
        self.0.take().unwrap().join().unwrap();
    }
}

impl Deref for Arb {
    type Target = actix::Arbiter;

    fn deref(&self) -> &Self::Target {
        self.0.as_ref().unwrap()
    }
}

struct A;

impl Drop for A {
    fn drop(&mut self) {
        std::thread::sleep_ms(200);
        println!("A");
    }
}

impl actix::Actor for A {
    type Context = actix::Context<A>;
}

System::run is just a block_on in latest actix-rt. So it behaves just like tokio's block_on. actix-rt is mainly used an re-export and a manager of multiple current thread runtime. It does not try to offer anything more than tokio offers.

If any change need to be done to let System stop yields and join all arbiter it should happen in actix repo or user code.

Feb 22 '21 07:02 fakeshadow