qm icon indicating copy to clipboard operation
qm copied to clipboard

Check BlueChi is ok e2e test fails

Open engelmi opened this issue 8 months ago • 2 comments
trafficstars

The /tests/qm-sanity-test/check_bluechi_is_ok was flaky due to the qm.service receiving a SIGINT and being restarted in a loop. So depending on the logs and the point in time of checking the test passed or failed.

With the refactoring in #754 and the usage of --wait=5000 the flakiness should have disappeared (since the agent in QM will connect to the controller at one point in time), but it is masking a failure of the qm.service receiving a SIGINT.

One test run showing this can be seen here: https://artifacts.dev.testing-farm.io/985c0972-6635-4a1b-873d-8124c0db87e8/

Copy of the log Failed to get property: Invalid request descriptor org.eclipse.bluechi is offline [ INFO ] FAIL: check_bluechi_is_ok: host bluechi-agent localrootfs is not connected to controller. [ INFO ] Journal for bluechi-controller:

Mar 19 07:43:42 ip-172-31-16-72.us-east-2.compute.internal bluechi-controller[10267]: Registered managed node from fd 11 as 'qm.localrootfs' Mar 19 07:43:41 ip-172-31-16-72.us-east-2.compute.internal bluechi-controller[10267]: Node 'qm.localrootfs' disconnected Mar 19 07:43:41 ip-172-31-16-72.us-east-2.compute.internal bluechi-controller[10267]: Registered managed node from fd 11 as 'qm.localrootfs' Mar 19 07:43:41 ip-172-31-16-72.us-east-2.compute.internal bluechi-controller[10267]: Registered managed node from fd 10 as 'localrootfs' Mar 19 07:43:41 ip-172-31-16-72.us-east-2.compute.internal bluechi-controller[10267]: Heartbeat disabled since configured interval '0' is <=0 Mar 19 07:43:41 ip-172-31-16-72.us-east-2.compute.internal bluechi-controller[10267]: Waiting for connection requests on socket /run/bluechi/bluechi.sock... Mar 19 07:43:41 ip-172-31-16-72.us-east-2.compute.internal bluechi-controller[10267]: Waiting for connection requests on port 842... Mar 19 07:43:41 ip-172-31-16-72.us-east-2.compute.internal bluechi-controller[10267]: Starting bluechi-controller 0.11.0-0.202503110917.git15a51c4 Mar 19 07:43:41 ip-172-31-16-72.us-east-2.compute.internal systemd[1]: Started BlueChi Controller systemd service. [ INFO ] Journal for local bluechi-agent:

Mar 19 07:43:41 ip-172-31-16-72.us-east-2.compute.internal bluechi-agent[10269]: Connected to controller as 'localrootfs' Mar 19 07:43:41 ip-172-31-16-72.us-east-2.compute.internal bluechi-agent[10269]: Register call response received Mar 19 07:43:41 ip-172-31-16-72.us-east-2.compute.internal bluechi-agent[10269]: Connecting to controller on tcp:host=127.0.0.1,port=842 Mar 19 07:43:41 ip-172-31-16-72.us-east-2.compute.internal bluechi-agent[10269]: Starting bluechi-agent 0.11.0-0.202503110917.git15a51c4 Mar 19 07:43:41 ip-172-31-16-72.us-east-2.compute.internal systemd[1]: Started BlueChi systemd service controller agent daemon. [ INFO ] Journal for qm bluechi-agent:

Mar 19 07:43:42 ip-172-31-16-72.us-east-2.compute.internal qm[10345]: 5eeede3d0680c24e8d0723967c1039e8b362061bd14021c1b957a2866632bb55 Mar 19 07:43:42 ip-172-31-16-72.us-east-2.compute.internal podman[10345]: 2025-03-19 07:43:42.014672825 +0000 UTC m=+0.185404389 container start 5eeede3d0680c24e8d0723967c1039e8b362061bd14021c1b957a2866632bb55 (image=, name=qm, PODMAN_SYSTEMD_UNIT=qm.service) Mar 19 07:43:42 ip-172-31-16-72.us-east-2.compute.internal systemd[1]: Started qm.service. Mar 19 07:43:42 ip-172-31-16-72.us-east-2.compute.internal podman[10345]: 2025-03-19 07:43:42.010151863 +0000 UTC m=+0.180883427 container init 5eeede3d0680c24e8d0723967c1039e8b362061bd14021c1b957a2866632bb55 (image=, name=qm, PODMAN_SYSTEMD_UNIT=qm.service) Mar 19 07:43:41 ip-172-31-16-72.us-east-2.compute.internal podman[10345]: 2025-03-19 07:43:41.85192552 +0000 UTC m=+0.022657084 container create 5eeede3d0680c24e8d0723967c1039e8b362061bd14021c1b957a2866632bb55 (image=, name=qm, PODMAN_SYSTEMD_UNIT=qm.service) Mar 19 07:43:41 ip-172-31-16-72.us-east-2.compute.internal systemd[1]: Starting qm.service... Mar 19 07:43:41 ip-172-31-16-72.us-east-2.compute.internal systemd[1]: Stopped qm.service. Mar 19 07:43:41 ip-172-31-16-72.us-east-2.compute.internal systemd[1]: qm.service: Failed with result 'exit-code'. Mar 19 07:43:41 ip-172-31-16-72.us-east-2.compute.internal systemd[1]: qm.service: Main process exited, code=exited, status=130/n/a Mar 19 07:43:41 ip-172-31-16-72.us-east-2.compute.internal qm[10272]: 0af12bd8f284cb8151ca8d360b5fd899999fa74d3b5865d90a99336ee4720d09 Mar 19 07:43:41 ip-172-31-16-72.us-east-2.compute.internal podman[10272]: 2025-03-19 07:43:41.770995537 +0000 UTC m=+0.296723309 container remove 0af12bd8f284cb8151ca8d360b5fd899999fa74d3b5865d90a99336ee4720d09 (image=, name=qm, PODMAN_SYSTEMD_UNIT=qm.service) Mar 19 07:43:41 ip-172-31-16-72.us-east-2.compute.internal podman[10272]: 2025-03-19 07:43:41.630586128 +0000 UTC m=+0.156313910 container died 0af12bd8f284cb8151ca8d360b5fd899999fa74d3b5865d90a99336ee4720d09 (image=, name=qm, PODMAN_SYSTEMD_UNIT=qm.service) Mar 19 07:43:41 ip-172-31-16-72.us-east-2.compute.internal systemd[1]: Stopping qm.service... Mar 19 07:43:40 ip-172-31-16-72.us-east-2.compute.internal qm[10040]: 0af12bd8f284cb8151ca8d360b5fd899999fa74d3b5865d90a99336ee4720d09 Mar 19 07:43:40 ip-172-31-16-72.us-east-2.compute.internal podman[10040]: 2025-03-19 07:43:40.911384004 +0000 UTC m=+0.592805293 container start 0af12bd8f284cb8151ca8d360b5fd899999fa74d3b5865d90a99336ee4720d09 (image=, name=qm, PODMAN_SYSTEMD_UNIT=qm.service) Mar 19 07:43:40 ip-172-31-16-72.us-east-2.compute.internal systemd[1]: Started qm.service. Mar 19 07:43:40 ip-172-31-16-72.us-east-2.compute.internal podman[10040]: 2025-03-19 07:43:40.904218692 +0000 UTC m=+0.585639971 container init 0af12bd8f284cb8151ca8d360b5fd899999fa74d3b5865d90a99336ee4720d09 (image=, name=qm, PODMAN_SYSTEMD_UNIT=qm.service) Mar 19 07:43:40 ip-172-31-16-72.us-east-2.compute.internal podman[10040]: 2025-03-19 07:43:40.373663519 +0000 UTC m=+0.055084798 container create 0af12bd8f284cb8151ca8d360b5fd899999fa74d3b5865d90a99336ee4720d09 (image=, name=qm, PODMAN_SYSTEMD_UNIT=qm.service) Mar 19 07:43:40 ip-172-31-16-72.us-east-2.compute.internal systemd[1]: Starting qm.service... Shared connection to 18.117.150.198 closed.

The reason for the qm.service being stopped needs to be investigated.

engelmi avatar Mar 19 '25 12:03 engelmi