drill
drill copied to clipboard
Benchmark panics if endpoint which assigns a value times out
I've encountered an issue where I have an endpoint returning a value that I assign to a variable which I depend on later, times out. This causes the whole benchmark to panic with the message:
Error connecting 'http://127.0.0.1:8000/endpoint/': reqwest::Error { kind: Request, url: "http://127.0.0.1:8000/endpoint/", source: TimedOut }
Error connecting 'http://127.0.0.1:8000/endpoint/': reqwest::Error { kind: Request, url: "http://127.0.0.1:8000/endpoint/", source: TimedOut }
thread 'thread 'tokio-runtime-workertokio-runtime-workerError connecting 'http://127.0.0.1:8000/endpoint/': reqwest::Error { kind: Request, url: "http://127.0.0.1:8000/endpoint/", source: TimedOut }
' panicked at 'thread 'Unknown 'resp.body.id' variable!' panicked at 'tokio-runtime-worker', Unknown 'resp.body.id' variable!Error connecting 'http://127.0.0.1:8000/endpoint/': reqwest::Error { kind: Request, url: "http://127.0.0.1:8000/endpoint/", source: TimedOut }
' panicked at 'src\interpolator.rs', Error connecting 'http://127.0.0.1:8000/endpoint/': reqwest::Error { kind: Request, url: "http://127.0.0.1:8000/endpoint/", source: TimedOut }
src\interpolator.rsUnknown 'resp.body.id' variable!thread 'Error connecting 'http://127.0.0.1:8000/endpoint/': reqwest::Error { kind: Request, url: "http://127.0.0.1:8000/endpoint/", source: TimedOut }
thread '', src\interpolator.rs:tokio-runtime-workerError connecting 'http://127.0.0.1:8000/endpoint/': reqwest::Error { kind: Request, url: "http://127.0.0.1:8000/endpoint/", source: TimedOut }
thread '' panicked at 'tokio-runtime-workerthread '::tokio-runtime-worker3939Error connecting 'http://127.0.0.1:8000/endpoint/': reqwest::Error { kind: Request, url: "http://127.0.0.1:8000/endpoint/", source: TimedOut }
' panicked at ':39Unknown 'resp.body.id' variable!' panicked at ':tokio-runtime-workerthread 'Unknown 'resp.body.id' variable!Error connecting 'http://127.0.0.1:8000/endpoint/': reqwest::Error { kind: Request, url: "http://127.0.0.1:8000/endpoint/", source: TimedOut }
11:', 11Unknown 'resp.body.id' variable!' panicked at 'tokio-runtime-worker', thread '
Error connecting 'http://127.0.0.1:8000/endpoint/': reqwest::Error { kind: Request, url: "http://127.0.0.1:8000/endpoint/", source: TimedOut }
11src\interpolator.rs
', thread '' panicked at 'src\interpolator.rstokio-runtime-worker
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
:src\interpolator.rsUnknown 'resp.body.id' variable!tokio-runtime-workerUnknown 'resp.body.id' variable!:' panicked at '39:', ' panicked at '', 39Unknown 'resp.body.id' variable!:39src\interpolator.rsUnknown 'resp.body.id' variable!src\interpolator.rs:', 11src\interpolator.rs
', :src\interpolator.rs::1139
::1139::39
391111
::
1111
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: JoinError::Panic(...)', src\benchmark.rs:89:106
Minimal reproducible example
Set up an actix server like this:
use actix_web::{HttpServer, App, HttpResponse, Responder, get, Result, web, post};
use serde_json::json;
use std::time::Duration;
#[get("/endpoint/")]
async fn endpoint() -> Result<impl Responder> {
actix_rt::time::delay_for(Duration::from_secs(12)).await; // <--- NB!
Ok(HttpResponse::Ok().json(json!({"id": 1 })))
}
#[get("/endpoint_2/{id}")]
async fn endpoint_post(id: web::Path<u32>) -> Result<impl Responder> {
println!("Got: {}", *id);
Ok(HttpResponse::Ok().json(json!({"status": "ok" })))
}
async fn app() -> std::io::Result<()> {
HttpServer::new(move || {
App::new()
.service(endpoint)
.service(endpoint_post)
})
.bind("127.0.0.1:8000")?
.client_timeout(0)
.run()
.await
}
fn main() {
let mut system = actix_rt::System::new("server1");
match system.block_on(app()) {
Ok(_) => println!("System exited!"),
Err(e) => println!("{}", e),
};
}
Cargo.toml
[dependencies]
serde_json = "1.0.53"
actix-rt = "1.1.1"
actix-web = "2.0.0"
Then run a benchmark like this:
concurrency: 10
base: 'http://127.0.0.1:8000'
iterations: 1
plan:
- name: Get Id
request:
url: /endpoint/
method: GET
headers:
Content-Type: 'application/json'
assign: resp
- name: Get Id
request:
url: /endpoint_2/{{resp.body.id}}
method: GET
Suggestions
I'm really in doubt on how this should be handled the best way, but I don't think that aborting the whole run in a panic is the best way of handling this.
An error occurring in one endpoint that might affect another call later in the plan, so one option is to err both calls or just skip a call which depends on a returned value which turns out to be Null so you don't report a false error on the second endpoint. It's just one "missing" call. A message could be logged out during the run, or reported in the final report.
Right now the value returned from an endpoint timing out is stored as serde_json::Value::Null, see https://github.com/fcsonline/drill/blob/89b790fec1e84d8b8dc367834b2b17b2b3401ea7/src/actions/request.rs#L276-L286.
This value seems to be retrieved in the Interpolator which panics if the value is not found. If Interpolator::resolve returned an Option<String> we could check for that and err instead of panic if it's not present.
Another solution is to let the timeout of a connection be user defined so it can be increased in these cases, but it's not really a solution to the problem since you might want an endpoint which takes more than 10 seconds to respond to err in all other cases.
Other information
Running drill built from the current master on Windows 10 using Rust stable-x86_64-pc-windows-msvc unchanged - rustc 1.43.1
Thank you so much for this detailed report. :100: It is true that this can be a common case during high load. Right now we are throwing a panic because the original use case is to catch typos or wrong defined plans like:
concurrency: 10
base: 'http://127.0.0.1:8000'
iterations: 1
plan:
- name: Get Id
request:
url: /endpoint/
method: GET
headers:
Content-Type: 'application/json'
assign: foo
- name: Get Id
request:
url: /endpoint_2/{{bar.body.id}}
method: GET
I think the best approach as you said, is:
- To abort the rest of the iteration, because it is really hard to determine all the dependencies you can have between next requests.
- Improve the message and be more user friendly during the run, instead of panic.
- Collect as a skipped report and show a summary at the end
Great! Yeah, I agree with the approach you outline.