drill Benchmark panics if endpoint which assigns a value times out

I've encountered an issue where I have an endpoint returning a value that I assign to a variable which I depend on later, times out. This causes the whole benchmark to panic with the message:

Error connecting 'http://127.0.0.1:8000/endpoint/': reqwest::Error { kind: Request, url: "http://127.0.0.1:8000/endpoint/", source: TimedOut }
Error connecting 'http://127.0.0.1:8000/endpoint/': reqwest::Error { kind: Request, url: "http://127.0.0.1:8000/endpoint/", source: TimedOut }
thread 'thread 'tokio-runtime-workertokio-runtime-workerError connecting 'http://127.0.0.1:8000/endpoint/': reqwest::Error { kind: Request, url: "http://127.0.0.1:8000/endpoint/", source: TimedOut }
' panicked at 'thread 'Unknown 'resp.body.id' variable!' panicked at 'tokio-runtime-worker', Unknown 'resp.body.id' variable!Error connecting 'http://127.0.0.1:8000/endpoint/': reqwest::Error { kind: Request, url: "http://127.0.0.1:8000/endpoint/", source: TimedOut }
' panicked at 'src\interpolator.rs', Error connecting 'http://127.0.0.1:8000/endpoint/': reqwest::Error { kind: Request, url: "http://127.0.0.1:8000/endpoint/", source: TimedOut }
src\interpolator.rsUnknown 'resp.body.id' variable!thread 'Error connecting 'http://127.0.0.1:8000/endpoint/': reqwest::Error { kind: Request, url: "http://127.0.0.1:8000/endpoint/", source: TimedOut }
thread '', src\interpolator.rs:tokio-runtime-workerError connecting 'http://127.0.0.1:8000/endpoint/': reqwest::Error { kind: Request, url: "http://127.0.0.1:8000/endpoint/", source: TimedOut }
thread '' panicked at 'tokio-runtime-workerthread '::tokio-runtime-worker3939Error connecting 'http://127.0.0.1:8000/endpoint/': reqwest::Error { kind: Request, url: "http://127.0.0.1:8000/endpoint/", source: TimedOut }  
' panicked at ':39Unknown 'resp.body.id' variable!' panicked at ':tokio-runtime-workerthread 'Unknown 'resp.body.id' variable!Error connecting 'http://127.0.0.1:8000/endpoint/': reqwest::Error { kind: Request, url: "http://127.0.0.1:8000/endpoint/", source: TimedOut }
11:', 11Unknown 'resp.body.id' variable!' panicked at 'tokio-runtime-worker', thread '
Error connecting 'http://127.0.0.1:8000/endpoint/': reqwest::Error { kind: Request, url: "http://127.0.0.1:8000/endpoint/", source: TimedOut }
11src\interpolator.rs
', thread '' panicked at 'src\interpolator.rstokio-runtime-worker
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
:src\interpolator.rsUnknown 'resp.body.id' variable!tokio-runtime-workerUnknown 'resp.body.id' variable!:' panicked at '39:', ' panicked at '', 39Unknown 'resp.body.id' variable!:39src\interpolator.rsUnknown 'resp.body.id' variable!src\interpolator.rs:', 11src\interpolator.rs
', :src\interpolator.rs::1139
::1139::39
391111
::
1111

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: JoinError::Panic(...)', src\benchmark.rs:89:106

Minimal reproducible example

Set up an actix server like this:

use actix_web::{HttpServer, App, HttpResponse, Responder, get, Result, web, post};
use serde_json::json;
use std::time::Duration;

#[get("/endpoint/")]
async fn endpoint() -> Result<impl Responder> {
    actix_rt::time::delay_for(Duration::from_secs(12)).await; // <--- NB!
    Ok(HttpResponse::Ok().json(json!({"id": 1 })))
}

#[get("/endpoint_2/{id}")]
async fn endpoint_post(id: web::Path<u32>) -> Result<impl Responder> {
    println!("Got: {}", *id);
    Ok(HttpResponse::Ok().json(json!({"status": "ok" })))
}

async fn app() -> std::io::Result<()> {
    HttpServer::new(move || {
        App::new()
        .service(endpoint)
        .service(endpoint_post)
    })
    .bind("127.0.0.1:8000")?
    .client_timeout(0)
    .run()
    .await
}

fn main() {
    let mut system = actix_rt::System::new("server1");
    match system.block_on(app()) {
        Ok(_) => println!("System exited!"),
        Err(e) => println!("{}", e),
    };
}

Cargo.toml

[dependencies]
serde_json = "1.0.53"
actix-rt = "1.1.1"
actix-web = "2.0.0"

Then run a benchmark like this:

concurrency: 10
base: 'http://127.0.0.1:8000'
iterations: 1


plan:
  - name: Get Id
    request:
      url: /endpoint/
      method: GET
      headers:
        Content-Type: 'application/json'
    assign: resp

  - name: Get Id
    request:
      url: /endpoint_2/{{resp.body.id}}
      method: GET

Suggestions

I'm really in doubt on how this should be handled the best way, but I don't think that aborting the whole run in a panic is the best way of handling this.

An error occurring in one endpoint that might affect another call later in the plan, so one option is to err both calls or just skip a call which depends on a returned value which turns out to be Null so you don't report a false error on the second endpoint. It's just one "missing" call. A message could be logged out during the run, or reported in the final report.

Right now the value returned from an endpoint timing out is stored as serde_json::Value::Null, see https://github.com/fcsonline/drill/blob/89b790fec1e84d8b8dc367834b2b17b2b3401ea7/src/actions/request.rs#L276-L286.

This value seems to be retrieved in the Interpolator which panics if the value is not found. If Interpolator::resolve returned an Option<String> we could check for that and err instead of panic if it's not present.

Another solution is to let the timeout of a connection be user defined so it can be increased in these cases, but it's not really a solution to the problem since you might want an endpoint which takes more than 10 seconds to respond to err in all other cases.

Other information

Running drill built from the current master on Windows 10 using Rust stable-x86_64-pc-windows-msvc unchanged - rustc 1.43.1

Jun 06 '20 00:06 cfsamson

Thank you so much for this detailed report. :100: It is true that this can be a common case during high load. Right now we are throwing a panic because the original use case is to catch typos or wrong defined plans like:

concurrency: 10
base: 'http://127.0.0.1:8000'
iterations: 1


plan:
  - name: Get Id
    request:
      url: /endpoint/
      method: GET
      headers:
        Content-Type: 'application/json'
    assign: foo

  - name: Get Id
    request:
      url: /endpoint_2/{{bar.body.id}}
      method: GET

I think the best approach as you said, is:

To abort the rest of the iteration, because it is really hard to determine all the dependencies you can have between next requests.
Improve the message and be more user friendly during the run, instead of panic.
Collect as a skipped report and show a summary at the end

Jun 13 '20 14:06 fcsonline

Great! Yeah, I agree with the approach you outline.

Jun 15 '20 19:06 cfsamson

drill drill copied to clipboard

Benchmark panics if endpoint which assigns a value times out

Minimal reproducible example

Suggestions

Other information

drill
drill copied to clipboard