wadm icon indicating copy to clipboard operation
wadm copied to clipboard

[BUG] Multiple hosts e2e integration test is flaky when deleting the link

Open brooksmtownsend opened this issue 1 year ago • 1 comments

When printing the links to stderr I can see the link still exists. This is likely an issue with the undeploy logic for an application and some kind of race condition of not sending out the command, not deleting the link, not cleaning up, etc. Needs more investigation.

Links: [InterfaceLinkDefinition { source_id: "http_server", target: "http_hello_world", name: "default", wit_namespace: "wasi", wit_package: "http", interfaces: ["incoming-handler"], source_config: ["hello_simple-httpaddr"], target_config: [] }]
test run_multiple_host_tests has been running for over 60 seconds
Links: [InterfaceLinkDefinition { source_id: "http_server", target: "http_hello_world", name: "default", wit_namespace: "wasi", wit_package: "http", interfaces: ["incoming-handler"], source_config: ["hello_simple-httpaddr"], target_config: [] }]
Links: [InterfaceLinkDefinition { source_id: "http_server", target: "http_hello_world", name: "default", wit_namespace: "wasi", wit_package: "http", interfaces: ["incoming-handler"], source_config: ["hello_simple-httpaddr"], target_config: [] }]
thread 'run_multiple_host_tests' panicked at tests/e2e.rs:322:9:
Failed to get ok response from check: The link between the http provider and hello component should be removed
stack backtrace:
   0: rust_begin_unwind
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:652:5
   1: core::panicking::panic_fmt
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/panicking.rs:72:14
   2: e2e_multiple_hosts::e2e::assert_status::{{closure}}
             at ./tests/e2e.rs:322:9
   3: e2e_multiple_hosts::test_no_requirements::{{closure}}
             at ./tests/e2e_multiple_hosts.rs:231:6
   4: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/future/future.rs:123:9
   5: e2e_multiple_hosts::run_multiple_host_tests::{{closure}}
             at ./tests/e2e_multiple_hosts.rs:93:48
   6: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/future/future.rs:123:9
   7: tokio::runtime::park::CachedParkThread::block_on::{{closure}}
             at /Users/brooks/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/park.rs:281:63
   8: tokio::runtime::coop::with_budget
             at /Users/brooks/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/coop.rs:107:5
   9: tokio::runtime::coop::budget
             at /Users/brooks/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/coop.rs:73:5
  10: tokio::runtime::park::CachedParkThread::block_on
             at /Users/brooks/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/park.rs:281:31
  11: tokio::runtime::context::blocking::BlockingRegionGuard::block_on
             at /Users/brooks/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/context/blocking.rs:66:9
  12: tokio::runtime::scheduler::multi_thread::MultiThread::block_on::{{closure}}
             at /Users/brooks/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/scheduler/multi_thread/mod.rs:87:13
  13: tokio::runtime::context::runtime::enter_runtime
             at /Users/brooks/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/context/runtime.rs:65:16
  14: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
             at /Users/brooks/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/scheduler/multi_thread/mod.rs:86:9
  15: tokio::runtime::runtime::Runtime::block_on
             at /Users/brooks/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/runtime.rs:351:45
  16: e2e_multiple_hosts::run_multiple_host_tests
             at ./tests/e2e_multiple_hosts.rs:103:5
  17: e2e_multiple_hosts::run_multiple_host_tests::{{closure}}
             at ./tests/e2e_multiple_hosts.rs:35:35
  18: core::ops::function::FnOnce::call_once
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/ops/function.rs:250:5
  19: core::ops::function::FnOnce::call_once
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
test run_multiple_host_tests ... FAILED

failures:

failures:
    run_multiple_host_tests

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 121.49s

error: test failed, to rerun pass `--test e2e_multiple_hosts`
make: *** [test-individual-e2e] Error 101

brooksmtownsend avatar Jul 03 '24 20:07 brooksmtownsend

If I had to guess, this is a difficult to hit race condition when running multiple wadm instances and undeploying a manifest. This probably has to do with each wadm instance publishing cleanup events and, in rare cases, scalers running on some wadm instance has outdated state and decides they do not need to publish a link delete command when it in fact exists.

brooksmtownsend avatar Jul 03 '24 20:07 brooksmtownsend