Adjust Worker::poll logic to fix pending wake failure in balance service.
#824
This commit aims to solve the problem that Worker::poll_next_msg may block service.poll_ready. Especially when the service is a Balance service, while multiple pending endpoints become ready, the task is blocked on Worker::poll_next_msg if there is no other message come which would cause these endpoints to be disconnected.
Thanks! Looks like CI failed...
Also, is there a way to update the tests to check this change?
Thanks for your attention, I have updated the tests to check this change.
@cratelyn do yall have a more extensive suite that includes balancing that can test this change out?
@cratelyn do yall have a more extensive suite that includes balancing that can test this change out?
we do! i'll try running the linkerd2-proxy tests with this patch applied, pardon the wait.
if this change is focused on solving how this middleware interacts with a Balance service, i might suggest adding a test case that exercises the buffer with a Balance service.
@cratelyn do yall have a more extensive suite that includes balancing that can test this change out?
we do! i'll try running the linkerd2-proxy tests with this patch applied, pardon the wait.
@seanmonstar i wasn't able to get a patched version of the linkerd2-proxy building with this patch. we're on tonic v0.12 at the moment, which depends on tower 0.4, and thus builds fail if this branch is used.
i did leave some questions above, however. i'm concerned that trace contexts no longer seem to be propagated, and that this change seems to change the behavior of Buffer. having debugged changes in behavior introduced by other changes to Buffer, e.g. #635, i'm very hesitant about this pr.
@cratelyn you've since ported all of linkerd to use the newer hyper/tonic, I think, right? If it'd be easier to pop this branch and see if the test suite is still happy, that could help move this along. If you're busy, understandable!
@cratelyn you've since ported all of linkerd to use the newer hyper/tonic, I think, right? If it'd be easier to pop this branch and see if the test suite is still happy, that could help move this along. If you're busy, understandable!
i ran the linkerd-proxy test suite against this branch, via a [patch.crates-io] directive.
tests did not pass, due to a new failure in this test: https://github.com/linkerd/linkerd2-proxy/blob/main/linkerd/stack/src/loadshed.rs#L133-L198
test loadshed::tests::buffer_load_shed ... FAILED
failures:
---- loadshed::tests::buffer_load_shed stdout ----
[ 0.000764s] DEBUG worker{id=oneshot4}: linkerd_stack::loadshed: Service has become unavailable
[ 0.000786s] DEBUG worker{id=oneshot4}: linkerd_stack::loadshed: Service shedding load
[ 0.000896s] DEBUG worker{id=oneshot6}: linkerd_stack::loadshed: Service has become unavailable
[ 0.000910s] DEBUG worker{id=oneshot6}: linkerd_stack::loadshed: Service shedding load
thread 'loadshed::tests::buffer_load_shed' panicked at linkerd/stack/src/loadshed.rs:181:9:
ready; value = Err(LoadShedError(()))
stack backtrace:
0: 0x55b0c0df6ab2 - std::backtrace_rs::backtrace::libunwind::trace::h74680e970b6e0712
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/../../backtrace/src/backtrace/libunwind.rs:117:9
1: 0x55b0c0df6ab2 - std::backtrace_rs::backtrace::trace_unsynchronized::ha3bf590e3565a312
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/../../backtrace/src/backtrace/mod.rs:66:14
2: 0x55b0c0df6ab2 - std::sys::backtrace::_print_fmt::hcf16024cbdd6c458
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/sys/backtrace.rs:66:9
3: 0x55b0c0df6ab2 - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::h46a716bba2450163
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/sys/backtrace.rs:39:26
4: 0x55b0c0e1b5b3 - core::fmt::rt::Argument::fmt::ha695e732309707b7
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/core/src/fmt/rt.rs:181:76
5: 0x55b0c0e1b5b3 - core::fmt::write::h275e5980d7008551
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/core/src/fmt/mod.rs:1446:25
6: 0x55b0c0df3f63 - std::io::default_write_fmt::h31683a0a922ca2b7
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/io/mod.rs:639:11
7: 0x55b0c0df3f63 - std::io::Write::write_fmt::hfb552b13b10253dc
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/io/mod.rs:1914:13
8: 0x55b0c0df6902 - std::sys::backtrace::BacktraceLock::print::hafb9d5969adc39a0
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/sys/backtrace.rs:42:9
9: 0x55b0c0df7fec - std::panicking::default_hook::{{closure}}::hae2e97a5c4b2b777
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/panicking.rs:300:22
10: 0x55b0c0df7e42 - std::panicking::default_hook::h3db1b505cfc4eb79
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/panicking.rs:324:9
11: 0x55b0c0cf0374 - <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::hb81979808caba656
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/alloc/src/boxed.rs:1980:9
12: 0x55b0c0cf0374 - test::test_main_with_exit_callback::{{closure}}::h830f77309e9e8595
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/test/src/lib.rs:145:21
13: 0x55b0c0df8a63 - <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::hd620b4648521795b
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/alloc/src/boxed.rs:1980:9
14: 0x55b0c0df8a63 - std::panicking::rust_panic_with_hook::h409da73ddef13937
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/panicking.rs:841:13
15: 0x55b0c0df873a - std::panicking::begin_panic_handler::{{closure}}::h159b61b27f96a9c2
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/panicking.rs:706:13
16: 0x55b0c0df6fa9 - std::sys::backtrace::__rust_end_short_backtrace::h5b56844d75e766fc
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/sys/backtrace.rs:168:18
17: 0x55b0c0df83cd - __rustc[4794b31dd7191200]::rust_begin_unwind
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/panicking.rs:697:5
18: 0x55b0c09c48f0 - core::panicking::panic_fmt::hc8737e8cca20a7c8
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/core/src/panicking.rs:75:14
19: 0x55b0c0a20ecf - linkerd_stack::loadshed::tests::buffer_load_shed::{{closure}}::h4c89fe60d39fa5fe
at /home/katie/linkerd/linkerd2-proxy/linkerd/stack/src/loadshed.rs:181:9
20: 0x55b0c09cf3d2 - <core::pin::Pin<P> as core::future::future::Future>::poll::h02712d90cefba1f7
at /home/katie/.rustup/toolchains/1.88.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/future/future.rs:124:9
21: 0x55b0c09cf68d - <core::pin::Pin<P> as core::future::future::Future>::poll::hf04c6c1994405935
at /home/katie/.rustup/toolchains/1.88.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/future/future.rs:124:9
22: 0x55b0c09ceedf - tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::{{closure}}::hab5da2eeab9b2e73
at /home/katie/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.1/src/runtime/scheduler/current_thread/mod.rs:742:54
23: 0x55b0c09cee35 - tokio::task::coop::with_budget::h3939f8a60371c392
at /home/katie/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.1/src/task/coop/mod.rs:167:5
24: 0x55b0c09cee35 - tokio::task::coop::budget::hec3d193970d2c2ec
at /home/katie/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.1/src/task/coop/mod.rs:133:5
25: 0x55b0c09cee35 - tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::hf8ca5f9c6c2994fc
at /home/katie/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.1/src/runtime/scheduler/current_thread/mod.rs:742:25
26: 0x55b0c09cc0f0 - tokio::runtime::scheduler::current_thread::Context::enter::hce05c34ff5e647db
at /home/katie/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.1/src/runtime/scheduler/current_thread/mod.rs:432:19
27: 0x55b0c09cd4ed - tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::h5ebc14e706b4f429
at /home/katie/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.1/src/runtime/scheduler/current_thread/mod.rs:741:36
28: 0x55b0c09cd1c4 - tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}::hebe09deb4ec8fbe0
at /home/katie/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.1/src/runtime/scheduler/current_thread/mod.rs:829:68
29: 0x55b0c09c66fb - tokio::runtime::context::scoped::Scoped<T>::set::h7422eebb8cd227b1
at /home/katie/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.1/src/runtime/context/scoped.rs:40:9
30: 0x55b0c09c5679 - tokio::runtime::context::set_scheduler::{{closure}}::haa4cf9f160db5b3b
at /home/katie/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.1/src/runtime/context.rs:176:26
31: 0x55b0c0a697b2 - std::thread::local::LocalKey<T>::try_with::h8a871093fe078251
at /home/katie/.rustup/toolchains/1.88.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:315:12
32: 0x55b0c0a681de - std::thread::local::LocalKey<T>::with::h21f56d9dd6966204
at /home/katie/.rustup/toolchains/1.88.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:279:15
33: 0x55b0c09c55ad - tokio::runtime::context::set_scheduler::h228efc4c12bfb4a4
at /home/katie/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.1/src/runtime/context.rs:176:9
34: 0x55b0c09ccf50 - tokio::runtime::scheduler::current_thread::CoreGuard::enter::h80625cc959d59981
at /home/katie/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.1/src/runtime/scheduler/current_thread/mod.rs:829:27
35: 0x55b0c09cd1e3 - tokio::runtime::scheduler::current_thread::CoreGuard::block_on::h59dec3f7a6b6328b
at /home/katie/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.1/src/runtime/scheduler/current_thread/mod.rs:729:19
36: 0x55b0c09c7ea9 - tokio::runtime::scheduler::current_thread::CurrentThread::block_on::{{closure}}::hcf7b0d87b5a8e0a7
at /home/katie/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.1/src/runtime/scheduler/current_thread/mod.rs:200:28
37: 0x55b0c0a5b078 - tokio::runtime::context::runtime::enter_runtime::h876de5a43a6df25e
at /home/katie/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.1/src/runtime/context/runtime.rs:65:16
38: 0x55b0c09c77f1 - tokio::runtime::scheduler::current_thread::CurrentThread::block_on::hbbd1aa9f79ee7da5
at /home/katie/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.1/src/runtime/scheduler/current_thread/mod.rs:188:9
39: 0x55b0c0a22d69 - tokio::runtime::runtime::Runtime::block_on_inner::h502017307180bd8a
at /home/katie/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.1/src/runtime/runtime.rs:356:47
40: 0x55b0c0a2304f - tokio::runtime::runtime::Runtime::block_on::h99870153e47504f7
at /home/katie/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.1/src/runtime/runtime.rs:330:13
41: 0x55b0c0a64611 - linkerd_stack::loadshed::tests::buffer_load_shed::h1f49f63d00c78621
at /home/katie/linkerd/linkerd2-proxy/linkerd/stack/src/loadshed.rs:197:9
42: 0x55b0c0a1fdf7 - linkerd_stack::loadshed::tests::buffer_load_shed::{{closure}}::h784ff89a30b6395b
at /home/katie/linkerd/linkerd2-proxy/linkerd/stack/src/loadshed.rs:134:32
43: 0x55b0c09fee16 - core::ops::function::FnOnce::call_once::h18e57c5b72a2bce5
at /home/katie/.rustup/toolchains/1.88.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
44: 0x55b0c0cf5b9b - core::ops::function::FnOnce::call_once::h6830b7b483df2d7b
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/core/src/ops/function.rs:250:5
45: 0x55b0c0cf5b9b - test::__rust_begin_short_backtrace::h6ad576a367cba051
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/test/src/lib.rs:648:18
46: 0x55b0c0cf4d82 - test::run_test_in_process::{{closure}}::h282029c456bdb1d0
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/test/src/lib.rs:671:60
47: 0x55b0c0cf4d82 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h92c3da85f1d7f07a
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/core/src/panic/unwind_safe.rs:272:9
48: 0x55b0c0cf4d82 - std::panicking::try::do_call::h569264ff5d41e944
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/panicking.rs:589:40
49: 0x55b0c0cf4d82 - std::panicking::try::h3253fc2f0f6f9e29
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/panicking.rs:552:19
50: 0x55b0c0cf4d82 - std::panic::catch_unwind::had653e4cb2e12066
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/panic.rs:359:14
51: 0x55b0c0cf4d82 - test::run_test_in_process::hd1ecf063ce636af0
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/test/src/lib.rs:671:27
52: 0x55b0c0cf4d82 - test::run_test::{{closure}}::h2f9e350abac1b079
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/test/src/lib.rs:592:43
53: 0x55b0c0cb93a4 - test::run_test::{{closure}}::hed24df14dd589f4e
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/test/src/lib.rs:622:41
54: 0x55b0c0cb93a4 - std::sys::backtrace::__rust_begin_short_backtrace::he0330b8283c070fc
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/sys/backtrace.rs:152:18
55: 0x55b0c0cbcc9a - std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}::hec3e9e5c0807d052
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/thread/mod.rs:559:17
56: 0x55b0c0cbcc9a - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h1b183baa756e4c0a
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/core/src/panic/unwind_safe.rs:272:9
57: 0x55b0c0cbcc9a - std::panicking::try::do_call::h72eba35930bfaae9
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/panicking.rs:589:40
58: 0x55b0c0cbcc9a - std::panicking::try::h31af7d64fc54fbca
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/panicking.rs:552:19
59: 0x55b0c0cbcc9a - std::panic::catch_unwind::h64fe4d1919153ee2
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/panic.rs:359:14
60: 0x55b0c0cbcc9a - std::thread::Builder::spawn_unchecked_::{{closure}}::h3e55c0af18b31fa4
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/thread/mod.rs:557:30
61: 0x55b0c0cbcc9a - core::ops::function::FnOnce::call_once{{vtable.shim}}::hefd468255a79af59
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/core/src/ops/function.rs:250:5
62: 0x55b0c0dfa3bb - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::he4962534b56a5929
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/alloc/src/boxed.rs:1966:9
63: 0x55b0c0dfa3bb - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h95af12d5a868b9d0
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/alloc/src/boxed.rs:1966:9
64: 0x55b0c0dfa3bb - std::sys::pal::unix::thread::Thread::new::thread_start::h1822d22fde68314f
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/sys/pal/unix/thread.rs:97:17
65: 0x7fadbd622272 - start_thread
66: 0x7fadbd69ddec - clone3
67: 0x0 - <unknown>
failures:
loadshed::tests::buffer_load_shed
test result: FAILED. 25 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s
i have not dug further into the cause, but i hope this helps!