lighthouse
lighthouse copied to clipboard
Counterintuitive `warp` error for `Unsupported endpoint version`
Description
When hitting the /eth/v2/validators/blocks/{slot} endpoint with an invalid randao_reveal (and verify_randao=true or empty) Lighthouse will return this error:
$ curl "http://localhost:5052/eth/v2/validator/blocks/1000000"
{"code":400,"message":"BAD_REQUEST: Unsupported endpoint version: v2","stacktraces":[]}
This makes zero sense, as the endpoint is cleared declared supporting v2 here:
https://github.com/sigp/lighthouse/blob/2983235650811437b44199f9c94e517e948a1e9b/beacon_node/http_api/src/lib.rs#L1928-L1935
Adding a panic to unsupported_endpoint_version to get a backtrace shows some funky warp stuff going on:
thread 'tokio-runtime-worker' panicked at 'unsupported_version_rejection: v2', beacon_node/http_api/src/version.rs:60:5
stack backtrace:
0: rust_begin_unwind
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:584:5
1: core::panicking::panic_fmt
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:142:14
2: http_api::version::unsupported_version_rejection
3: <warp::filter::and_then::AndThenFuture<T,F> as core::future::future::Future>::poll
4: <warp::filter::and::AndFuture<T,U> as core::future::future::Future>::poll
5: <warp::filter::and::AndFuture<T,U> as core::future::future::Future>::poll
6: <warp::filter::and::AndFuture<T,U> as core::future::future::Future>::poll
7: <warp::filter::and::AndFuture<T,U> as core::future::future::Future>::poll
8: <warp::filter::and_then::AndThenFuture<T,F> as core::future::future::Future>::poll
9: <futures_util::future::try_future::into_future::IntoFuture<Fut> as core::future::future::Future>::poll
10: <warp::filter::or::EitherFuture<T,U> as core::future::future::Future>::poll
11: <warp::filter::or::EitherFuture<T,U> as core::future::future::Future>::poll
12: <F as futures_core::future::TryFuture>::try_poll
13: <warp::filter::or::EitherFuture<T,U> as core::future::future::Future>::poll
14: <warp::filter::and::AndFuture<T,U> as core::future::future::Future>::poll
15: <warp::filter::or::EitherFuture<T,U> as core::future::future::Future>::poll
16: <warp::filter::recover::RecoverFuture<T,F> as core::future::future::Future>::poll
17: <warp::filters::log::internal::WithLogFuture<FN,F> as core::future::future::Future>::poll
18: <warp::filters::cors::internal::WrappedFuture<F> as core::future::future::Future>::poll
19: scoped_tls::ScopedKey<T>::set
20: <warp::filter::service::FilteredFuture<F> as core::future::future::Future>::poll
21: hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_catch
22: <hyper::server::conn::upgrades::UpgradeableConnection<I,S,E> as core::future::future::Future>::poll
23: <hyper::server::server::new_svc::NewSvcTask<I,N,S,E,W> as core::future::future::Future>::poll
24: tokio::runtime::task::core::CoreStage<T>::poll
25: tokio::runtime::task::harness::Harness<T,S>::poll
26: std::thread::local::LocalKey<T>::with
27: tokio::runtime::thread_pool::worker::Context::run_task
28: tokio::runtime::thread_pool::worker::Context::run
29: tokio::macros::scoped_tls::ScopedKey<T>::set
30: tokio::runtime::thread_pool::worker::run
31: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
32: tokio::runtime::task::harness::Harness<T,S>::poll
33: tokio::runtime::blocking::pool::Inner::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
i.e. it seems that warp is doing some counterintuitive backtracking through a few and_then/and/or until it hits the unsupported_version_endpoint for a different endpoint (one that is v1 only). This makes little sense to me, and would probably require someone to dig pretty deep into warp to understand why this happens. I think this is similar to another open issue (https://github.com/sigp/lighthouse/issues/3112) related to our apparent misuse of warp combinators.
Finally, if verify_randao=false is set or the randao reveal is valid then no error occurs:
$ curl "http://localhost:5052/eth/v2/validator/blocks/3574858?verify_randao=false"
{ .. }
Version
v2.5.0
Related issues
- https://github.com/sigp/lighthouse/issues/3112
It may be that we want to avoid and_then for our handlers and instead use then. The warp docs state:
Rejections are meant to say “this filter didn’t accept the request, maybe another can”. So for application-level errors, consider using Filter::then instead.
I'll look into this one and follow up, thx
It may be that we want to avoid and_then for our handlers and instead use then.
@jmcph4 and I got this working on https://github.com/sigp/lighthouse/pull/4316. Our approach avoids returning a warp::reject::Reject for errors in the handlers themselves. For now we're converting the rejects into error responses using handle_rejection, like this:
https://github.com/sigp/lighthouse/blob/73b30a64472455fa555896b70559b5a55c335db3/beacon_node/http_api/src/lib.rs#L1232-L1264
Jack will fix just the new v2/blocks endpoints in that PR, and this issue can remain open until we roll out the fix across the codebase. We might want to write a small abstraction to minimise the amount of code duplication (something like then_or_reject, then_or_error).
@michaelsproul Thanks for helping get this one moving. Been a bit busy lately but should be freeing up more in the immediate future.
I'm not sure how a rejection handler solves our initial issue of warp mixing up endpoint specific logic... If you've got any advice/guidance there it would be much appreciated.
Also, do you mind clarifying how this would look? We might want to write a small abstraction to minimise the amount of code duplication (something like then_or_reject, then_or_error).
Hey @muang0, the pattern we found that works is this one:
https://github.com/sigp/lighthouse/blob/dfcb3363c757671eb19d5f8e519b4b94ac74677a/beacon_node/http_api/src/lib.rs#L1248-L1274
i.e. change the and_then to a then so that there's no possibility of returning a Rejection. This prevents warp from backtracking when a handler errors.
The complication is that, then expects a Response as the return value, not a Result<Response, Rejection> so we need to convert the Rejections that were previously returned into responses, which is what that warp_utils::reject::handle_rejection accomplishes. The abstraction I'm thinking of is something like:
then_or_error(|| {
// this function `f` returns a `Result<T, Rejection>`, which `then_or_error` will convert to a
// `Response`. We can hopefully do a find and replace of `and_then` for `then_or_error`,
// with minimal other changes.
f()
})
The implementation of then_or_error might look like this:
use warp::{filters::BoxedFilter, Filter, reply::Reply};
// Mixin trait (add methods to all warp types implementing Filter)
pub trait ThenOrError {
fn then_or_error<G, R>(self, g: G) -> BoxedFilter<(Response,)>
where G: Future<Item = Result<R, Rejection>>,
R: Reply,
{
self.then(|| {
match f.await {
Ok(result) => result.into_response(),
Err(e) => match warp_utils::reject::handle_rejection(e).await {
Ok(reply) => reply.into_response(),
Err(_) => warp::reply::with_status(
StatusCode::INTERNAL_SERVER_ERROR,
eth2::StatusCode::INTERNAL_SERVER_ERROR,
)
.into_response(),
},
}
})
.boxed()
}
}
impl<F> ThenOrError for F: Filter {}
This might require some tweaks to get working.
We are also working on a big overhaul of the HTTP API here: https://github.com/sigp/lighthouse/pull/4462, so it might make sense for that to stabilise before rolling out this change (or we can raise a PR based on #4462 and merge this afterwards). I'm away for the next two weeks, but if you feel like working on it in the meantime I hope this is enough info to make progress
Thanks for the clarification @michaelsproul
I think we may be able to solve this using warp-native constructs (see example here). I'll play around with it a bit tomorrow.
In the meantime, can anyone provide context around the choice of a custom uor rather than or (beyond the in-line comment below)?
This is a shorthand for self.or(other).unify().boxed(), which is useful because it keeps the filter type simple and prevents type-checker explosions.
Is now an appropriate time for us to face the 'type-checker explosions?'
Hey @muang0, I don't think recover on its own is sufficient. We are already using it here:
https://github.com/sigp/lighthouse/blob/dfcb3363c757671eb19d5f8e519b4b94ac74677a/beacon_node/http_api/src/lib.rs#L3945
The problem is the and_then: it will backtrack and try a different handler if the handler returns a Rejection, which is exactly what we don't want. We could maybe achieve the same thing as then_or_error by sprinkling recovers immediately after every and_then, is that what you had in mind?
Is now an appropriate time for us to face the 'type-checker explosions?'
I don't think there's a better solution, but willing to be proved wrong. The boxing greatly simplifies the types and prevents us from hitting the compiler's recursion limit on massive types like Or<Or<Or<Or<Or<Or<....>>>>>>. It mostly worked fine without the boxing before, except that new compiler releases would subtly change the recursion limit behaviour and cause compilation to fail. We haven't had any compilation issues since adopting the boxing.
We could maybe achieve the same thing as then_or_error by sprinkling recovers immediately after every and_then, is that what you had in mind?
Wasn't thinking of this approach in particular, but I think this would be more optimal than writing a custom Mixin.
new compiler releases would subtly change the recursion limit behaviour and cause compilation to fail. We haven't had any compilation issues since adopting the boxing.
I'm following you, neat workaround for the compiler issues.
Assuming no concerns, I'll update an endpoint to use recover() after the and_then; then we can see how it looks/behaves
~Testing blocked on an interesting cloudfront error. I'll give it some time before retrying & the subsequent Crates/AWS support tickets~ unblocked
bender@Jamess-MacBook-Air lighthouse % make install-lcli
cargo install --path lcli --force --locked \
--features "jemalloc" \
--profile "release" \
Installing lcli v4.3.0 (/Users/bender/Projects/fun/lighthouse/lcli)
Updating crates.io index
error: download of ha/sh/hashers failed
Caused by:
failed to get successful HTTP response from `https://index.crates.io/ha/sh/hashers` (18.165.83.65), got 421
debug headers:
x-amz-cf-pop: IAD12-P1
x-cache: Error from cloudfront
x-amz-cf-pop: IAD55-P3
x-amz-cf-id: bK5gURNzbur0Xk8kh7XHscLP0xLRes6LhH_hq4Ez1R5gcSHN8ir2cQ==
body:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<TITLE>ERROR: The request could not be satisfied</TITLE>
</HEAD><BODY>
<H1>421 ERROR</H1>
<H2>The request could not be satisfied.</H2>
<HR noshade size="1px">
The distribution does not match the certificate for which the HTTPS connection was established with.
We can't connect to the server for this app or website at this â¦
make: *** [install-lcli] Error 101
~Running into an issue starting the local testnet, looks like the clients are expecting a deploy_block.txt file that's not being created. Will investigate more tmrw.~ unblocked
Update: Adding recover() to each filter didn't solve the backtracking issue. I'll try the Mixin approach next.
@muang0 Are you still working on this? If you're stuck I'd like to give it a try, particularly now that #4462 is merged.
Sorry @muang0 I ended up just going ahead and doing it. We need this fix quite soon for our next release, and we were getting some stack overflows running debug tests and thought this might help them as well.
PR https://github.com/sigp/lighthouse/pull/4595
I realised we need to repeat the same fix for the VC API, maybe you could tackle that @muang0? See https://github.com/sigp/lighthouse/issues/4597
@michaelsproul I'm very very sorry for the delayed response from my end, I've had a rough summer with a couple things that came up unexpectedly that needed my full attention for the past month. I'm just now back in my normal environment and getting my legs underneath me again so to speak. I noticed #4597 has someone working on it already. If there's anything else lightweight and low-priority that could use help I'm happy to take a look; else I'll keep an eye on the backlog for a good 1st issue to pop up.