wadm
wadm copied to clipboard
[BUG] Multiple hosts e2e integration test is flaky when deleting the link
When printing the links to stderr I can see the link still exists. This is likely an issue with the undeploy logic for an application and some kind of race condition of not sending out the command, not deleting the link, not cleaning up, etc. Needs more investigation.
Links: [InterfaceLinkDefinition { source_id: "http_server", target: "http_hello_world", name: "default", wit_namespace: "wasi", wit_package: "http", interfaces: ["incoming-handler"], source_config: ["hello_simple-httpaddr"], target_config: [] }]
test run_multiple_host_tests has been running for over 60 seconds
Links: [InterfaceLinkDefinition { source_id: "http_server", target: "http_hello_world", name: "default", wit_namespace: "wasi", wit_package: "http", interfaces: ["incoming-handler"], source_config: ["hello_simple-httpaddr"], target_config: [] }]
Links: [InterfaceLinkDefinition { source_id: "http_server", target: "http_hello_world", name: "default", wit_namespace: "wasi", wit_package: "http", interfaces: ["incoming-handler"], source_config: ["hello_simple-httpaddr"], target_config: [] }]
thread 'run_multiple_host_tests' panicked at tests/e2e.rs:322:9:
Failed to get ok response from check: The link between the http provider and hello component should be removed
stack backtrace:
0: rust_begin_unwind
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:652:5
1: core::panicking::panic_fmt
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/panicking.rs:72:14
2: e2e_multiple_hosts::e2e::assert_status::{{closure}}
at ./tests/e2e.rs:322:9
3: e2e_multiple_hosts::test_no_requirements::{{closure}}
at ./tests/e2e_multiple_hosts.rs:231:6
4: <core::pin::Pin<P> as core::future::future::Future>::poll
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/future/future.rs:123:9
5: e2e_multiple_hosts::run_multiple_host_tests::{{closure}}
at ./tests/e2e_multiple_hosts.rs:93:48
6: <core::pin::Pin<P> as core::future::future::Future>::poll
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/future/future.rs:123:9
7: tokio::runtime::park::CachedParkThread::block_on::{{closure}}
at /Users/brooks/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/park.rs:281:63
8: tokio::runtime::coop::with_budget
at /Users/brooks/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/coop.rs:107:5
9: tokio::runtime::coop::budget
at /Users/brooks/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/coop.rs:73:5
10: tokio::runtime::park::CachedParkThread::block_on
at /Users/brooks/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/park.rs:281:31
11: tokio::runtime::context::blocking::BlockingRegionGuard::block_on
at /Users/brooks/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/context/blocking.rs:66:9
12: tokio::runtime::scheduler::multi_thread::MultiThread::block_on::{{closure}}
at /Users/brooks/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/scheduler/multi_thread/mod.rs:87:13
13: tokio::runtime::context::runtime::enter_runtime
at /Users/brooks/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/context/runtime.rs:65:16
14: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
at /Users/brooks/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/scheduler/multi_thread/mod.rs:86:9
15: tokio::runtime::runtime::Runtime::block_on
at /Users/brooks/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/runtime.rs:351:45
16: e2e_multiple_hosts::run_multiple_host_tests
at ./tests/e2e_multiple_hosts.rs:103:5
17: e2e_multiple_hosts::run_multiple_host_tests::{{closure}}
at ./tests/e2e_multiple_hosts.rs:35:35
18: core::ops::function::FnOnce::call_once
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/ops/function.rs:250:5
19: core::ops::function::FnOnce::call_once
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
test run_multiple_host_tests ... FAILED
failures:
failures:
run_multiple_host_tests
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 121.49s
error: test failed, to rerun pass `--test e2e_multiple_hosts`
make: *** [test-individual-e2e] Error 101
If I had to guess, this is a difficult to hit race condition when running multiple wadm instances and undeploying a manifest. This probably has to do with each wadm instance publishing cleanup events and, in rare cases, scalers running on some wadm instance has outdated state and decides they do not need to publish a link delete command when it in fact exists.