tower icon indicating copy to clipboard operation
tower copied to clipboard

propogate panics through `Buffer`

Open yaahc opened this issue 4 years ago • 10 comments

Problem

In zebra we're seeing a panic during one of our services poll_ready which kills the thread its running in but not the entire application.

https://github.com/ZcashFoundation/zebra/issues/394

The issue seems to be caused by Buffer not propagating the panic.

Proposed Solution

  • change Buffer::new to store the JoinHandle rather than discarding it
  • change poll_ready on Buffer to propogate errors from the join handle in the case where the tx to the spawned future fails, rather than creating a new error.

yaahc avatar May 26 '20 17:05 yaahc

Also, I plan on submitting a PR to implement this feature and test it against our code to verify that it does in fact solve our bug and cause the panics to get propagated correctly.

yaahc avatar May 26 '20 17:05 yaahc

Further testing showed that we were actually discarding the synthesized error that represents the background thread being unreachable, so even if Buffer propagated panics this would still have been a bug, but I think ideally I'd still like to see Buffer propagate panics.

yaahc avatar May 26 '20 23:05 yaahc

This ties into the error handling/API design section of my proposal in #450

saleemrashid avatar May 28 '20 15:05 saleemrashid

Seems like a good idea! Would tie is pretty directly to tokio but I think that is totally fair right now :)

LucioFranco avatar Jun 15 '20 16:06 LucioFranco

@yaahc did we ever get this fixed and can we close this? Or should we keep it open and try to get some fix in 0.4?

LucioFranco avatar Nov 27 '20 19:11 LucioFranco

I don't think this ever got fixed. We ended up switching to panic=abort in zebra so it was less important for us to fix this, probably still a good idea for 0.4 but not a breaking change afaik.

yaahc avatar Nov 27 '20 19:11 yaahc

Okay, this is probably needed actually, we ran into an issue with panic = abort that will force us to disable it until I can add a new feature in the compiler and have it make it to stable...

I'll start working on this right away

yaahc avatar Dec 09 '20 18:12 yaahc

@yaahc I would like to look into this but I can't quite reproduce it. Probably just something I'm misunderstanding.

In this example the panic in poll_ready does get propagated back to main:

use std::future::Future;
use std::{
    pin::Pin,
    task::{Context, Poll},
};
use tower::{buffer::Buffer, BoxError};
use tower::{Service, ServiceExt};

#[tokio::main]
async fn main() {
    let mut service = Buffer::new(MyService, 100);

    let res = service.ready().await.unwrap().call("bar").await.unwrap();

    eprintln!("panic not propagated")
}

struct MyService;

impl Service<&'static str> for MyService {
    type Response = &'static str;
    type Error = BoxError;
    type Future = ResponseFuture;

    fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        panic!("poll_ready panicked!")
    }

    fn call(&mut self, req: &'static str) -> Self::Future {
        ResponseFuture
    }
}

struct ResponseFuture;

impl Future for ResponseFuture {
    type Output = Result<&'static str, BoxError>;

    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        Poll::Ready(Ok("foo"))
    }
}

Is your scenario different or perhaps it magically fixed itself 🤞

Tested with

tokio = { version = "1.2.0", features = ["full"] }
tower = { version = "0.4.6", features = ["full"] }

davidpdrsn avatar Mar 01 '21 19:03 davidpdrsn

@davidpdrsn what's the error output?

I think it must still be an issue because the tokio::spawn here still discards the JoinHandle, but I'm not positive. If I had to guess why it looks like its working I'm guessing that the panic in the services is still printed to stdout by the panic handler before the unwind starts and then the panic is caught, and then the poll_ready on the Buffer handle fails because the background service has fallen over, but it doesn't pass back a captured version of the panic.

yaahc avatar Mar 01 '21 20:03 yaahc

If I had to guess why it looks like its working I'm guessing that the panic in the services is still printed to stdout by the panic handler before the unwind starts and then the panic is caught

Ah yeah that is indeed the case. I didn't read the output carefully enough. It is:

thread 'tokio-runtime-worker' panicked at 'poll_ready paniced!', src/main.rs:26:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Closed', src/main.rs:13:64

So "thread 'main' panicked at 'called Result::unwrap() on an Err value: Closed', src/main.rs:13:64" is the actual reason it fails, not the panic from poll_ready.

I'll dig into this!

davidpdrsn avatar Mar 01 '21 20:03 davidpdrsn