CI: rootless integration test hang
Something recent seem to have caused a hang in the rootless integration tests in CI, it is not clear what test, here are two logs:
https://cirrus-ci.com/task/4733420225429504 https://cirrus-ci.com/task/5597909967699968
From the journal logs which seem to dump the process list
...
Feb 19 00:49:01 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 1338 1 sshd: /usr/sbin/sshd -D [listener] 1 of 10-100 startups
Feb 19 00:49:01 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 1385 1302 /bin/sh /tmp/metadata-scripts3233089414/startup-script
Feb 19 00:49:01 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 1414 1385 ./cirrus agent -task-id 5597909967699968 -client-token 3d85620f6b894170a559100f8aa9c9dd -server-token 3f7d45a01eb8439e8ccc8548a664c7f3 -api-endpoint grpc.cirrus-ci.com:443
Feb 19 00:49:01 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 1950 1 /usr/bin/conmon --api-version 1 -c f850d92cb76b72e63b5b4a4e05fa2cf8c8674c930352585a4649bce43bc37a9b -u f850d92cb76b72e63b5b4a4e05fa2cf8c8674c930352585a4649bce43bc37a9b -r /usr/bin/crun -b /var/lib/containers/storage/overlay-containers/f850d92cb76b72e63b5b4a4e05fa2cf8c8674c930352585a4649bce43bc37a9b/userdata -p /run/containers/storage/overlay-containers/f850d92cb76b72e63b5b4a4e05fa2cf8c8674c930352585a4649bce43bc37a9b/userdata/pidfile -n registry --exit-dir /run/libpod/exits --persist-dir /run/libpod/persist/f850d92cb76b72e63b5b4a4e05fa2cf8c8674c930352585a4649bce43bc37a9b --full-attach -s -l journald --log-level warning --syslog --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/run/containers/storage/overlay-containers/f850d92cb76b72e63b5b4a4e05fa2cf8c8674c930352585a4649bce43bc37a9b/userdata/oci-log --conmon-pidfile /run/containers/storage/overlay-containers/f850d92cb76b72e63b5b4a4e05fa2cf8c8674c930352585a4649bce43bc37a9b/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /run/containers/storage --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/libpod --exit-command-arg --network-config-dir --exit-command-arg --exit-command-arg --network-backend --exit-command-arg netavark --exit-command-arg --volumepath --exit-command-arg /var/lib/containers/storage/volumes --exit-command-arg --db-backend --exit-command-arg sqlite --exit-command-arg --transient-store=false --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mountopt=nodev --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --stopped-only --exit-command-arg f850d92cb76b72e63b5b4a4e05fa2cf8c8674c930352585a4649bce43bc37a9b
Feb 19 00:49:01 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 1952 1950 registry serve /etc/docker/registry/config.yml
Feb 19 00:49:01 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 15964 1414 /bin/bash /tmp/scripts454b54dc448b1000b8b2e5254677a396.sh
Feb 19 00:49:01 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 15965 15964 ssh rootlessuser@localhost -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o CheckHostIP=no /var/tmp/go/src/github.com/containers/buildah/./contrib/cirrus/test.sh integration
Feb 19 00:49:01 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 15966 15964 awk -f /var/tmp/go/src/github.com/containers/buildah/./contrib/cirrus/timestamp.awk
Feb 19 00:49:01 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 16016 1338 sshd-session: rootlessuser [priv]
Feb 19 00:49:01 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 16021 1 /usr/lib/systemd/systemd --user
Feb 19 00:49:01 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 16057 16016 sshd-session: rootlessuser@notty
Feb 19 00:49:01 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 16058 16057 sh -c /var/tmp/go/src/github.com/containers/buildah/./contrib/cirrus/test.sh integration
Feb 19 00:49:01 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 16059 16058 bash /var/tmp/go/src/github.com/containers/buildah/./contrib/cirrus/test.sh integration
Feb 19 00:49:01 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 16270 16021 catatonit -P
Feb 19 00:49:01 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 16327 16021 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
Feb 19 00:49:01 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 16442 16059 make test-integration
Feb 19 00:49:01 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 16533 16442 /bin/sh -c cd tests; ./test_runner.sh
Feb 19 00:49:01 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 16534 16533 bash ./test_runner.sh
Feb 19 00:49:01 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 16538 16534 bash /usr/libexec/bats-core/bats -j 8 --tap .
Feb 19 00:49:02 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 16545 16538 bash /usr/libexec/bats-core/bats-exec-suite --dummy-flag -j 8 /var/tmp/go/src/github.com/containers/buildah/tests/./add.bats /var/tmp/go/src/github.com/containers/buildah/tests/./authenticate.bats /var/tmp/go/src/github.com/containers/buildah/tests/./basic.bats /var/tmp/go/src/github.com/containers/buildah/tests/./blobcache.bats /var/tmp/go/src/github.com/containers/buildah/tests/./bud.bats /var/tmp/go/src/github.com/containers/buildah/tests/./bud_overlay_leaks.bats /var/tmp/go/src/github.com/containers/buildah/tests/./byid.bats /var/tmp/go/src/github.com/containers/buildah/tests/./cdi.bats /var/tmp/go/src/github.com/containers/buildah/tests/./chroot.bats /var/tmp/go/src/github.com/containers/buildah/tests/./commit.bats /var/tmp/go/src/github.com/containers/buildah/tests/./config.bats /var/tmp/go/src/github.com/containers/buildah/tests/./containers.bats /var/tmp/go/src/github.com/containers/buildah/tests/./containers_conf.bats /var/tmp/go/src/github.com/containers/buildah/tests/./copy.bats /var/tmp/go/src/github.com/containers/buildah/tests/./digest.bats /var/tmp/go/src/github.com/containers/buildah/tests/./formats.bats /var/tmp/go/src/github.com/containers/buildah/tests/./from.bats /var/tmp/go/src/github.com/containers/buildah/tests/./help.bats /var/tmp/go/src/github.com/containers/buildah/tests/./history.bats /var/tmp/go/src/github.com/containers/buildah/tests/./images.bats /var/tmp/go/src/github.com/containers/buildah/tests/./info.bats /var/tmp/go/src/github.com/containers/buildah/tests/./inspect.bats /var/tmp/go/src/github.com/containers/buildah/tests/./lists.bats /var/tmp/go/src/github.com/containers/buildah/tests/./loglevel.bats /var/tmp/go/src/github.com/containers/buildah/tests/./mkcw.bats /var/tmp/go/src/github.com/containers/buildah/tests/./mount.bats /var/tmp/go/src/github.com/containers/buildah/tests/./namespaces.bats /var/tmp/go/src/github.com/containers/buildah/tests/./overlay.bats /var/tmp/go/src/github.com/containers/buildah/tests/./pull.bats /var/tmp/go/src/github.com/containers/buildah/tests/./push.bats /var/tmp/go/src/github.com/containers/buildah/tests/./registries.bats /var/tmp/go/src/github.com/containers/buildah/tests/./rename.bats /var/tmp/go/src/github.com/containers/buildah/tests/./rm.bats /var/tmp/go/src/github.com/containers/buildah/tests/./rmi.bats /var/tmp/go/src/github.com/containers/buildah/tests/./run.bats /var/tmp/go/src/github.com/containers/buildah/tests/./sbom.bats /var/tmp/go/src/github.com/containers/buildah/tests/./selinux.bats /var/tmp/go/src/github.com/containers/buildah/tests/./sign.bats /var/tmp/go/src/github.com/containers/buildah/tests/./source.bats /var/tmp/go/src/github.com/containers/buildah/tests/./squash.bats /var/tmp/go/src/github.com/containers/buildah/tests/./ssh.bats /var/tmp/go/src/github.com/containers/buildah/tests/./subscriptions.bats /var/tmp/go/src/github.com/containers/buildah/tests/./tag.bats /var/tmp/go/src/github.com/containers/buildah/tests/./tutorial.bats /var/tmp/go/src/github.com/containers/buildah/tests/./umount.bats
Feb 19 00:49:02 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 16546 16538 bash /usr/libexec/bats-core/bats -j 8 --tap .
Feb 19 00:49:02 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 16547 16538 bash /usr/libexec/bats-core/bats-format-cat --dummy-flag
Feb 19 00:49:02 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 16548 16547 cat
Feb 19 00:49:02 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 17874 16545 /usr/bin/perl /usr/bin/parallel --keep-order --jobs 8 -- bats-exec-file --dummy-flag -j 8 {} /var/tmp/bats-run-56dgN0/test_list_file.txt
Feb 19 00:49:02 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 94235 2
Feb 19 00:49:02 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 283328 17874 bash /usr/libexec/bats-core/bats-exec-file --dummy-flag -j 8 /var/tmp/go/src/github.com/containers/buildah/tests/./source.bats /var/tmp/bats-run-56dgN0/test_list_file.txt
Feb 19 00:49:02 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 300188 283328 bash /usr/libexec/bats-core/bats-exec-file --dummy-flag -j 8 /var/tmp/go/src/github.com/containers/buildah/tests/./source.bats /var/tmp/bats-run-56dgN0/test_list_file.txt
Feb 19 00:49:02 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 300316 300188 bash /usr/libexec/bats-core/bats-exec-test --dummy-flag /var/tmp/go/src/github.com/containers/buildah/tests/./source.bats test_source_push-2fpull 826 3 1
Feb 19 00:49:02 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 306826 300316 /var/tmp/go/src/github.com/containers/buildah/tests/./../bin/buildah --storage-driver vfs --root /var/tmp/buildah_tests.wsc5hw/registry/root --runroot /var/tmp/buildah_tests.wsc5hw/registry/run run --net host registry-working-container /entrypoint.sh /etc/docker/registry/config.yml
Feb 19 00:49:02 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 306879 306826 /var/tmp/go/src/github.com/containers/buildah/tests/./../bin/buildah-in-a-user-namespace --storage-driver vfs --root /var/tmp/buildah_tests.wsc5hw/registry/root --runroot /var/tmp/buildah_tests.wsc5hw/registry/run run --net host registry-working-container /entrypoint.sh /etc/docker/registry/config.yml
Feb 19 00:49:02 cirrus-task-5597909967699968 google_metadata_script_runner[1302]: startup-script: 307195 306879 buildah-oci-runtime
...
The other journal log seems to be cut of and does not show all processes, so if you see this again please link the tasks here so we can find the common battern
Another log: https://cirrus-ci.com/task/5817056873938944
Mar 06 19:22:14 cirrus-task-5817056873938944 google_metadata_script_runner[1301]: startup-script: 188843 17899 bash /usr/libexec/bats-core/bats-exec-file --dummy-flag -j 8 /var/tmp/go/src/github.com/containers/buildah/tests/./pull.bats /tmp/bats-run-hwX3aY/test_list_file.txt
Mar 06 19:22:14 cirrus-task-5817056873938944 google_metadata_script_runner[1301]: startup-script: 222279 2
Mar 06 19:22:14 cirrus-task-5817056873938944 google_metadata_script_runner[1301]: startup-script: 258496 188843 bash /usr/libexec/bats-core/bats-exec-file --dummy-flag -j 8 /var/tmp/go/src/github.com/containers/buildah/tests/./pull.bats /tmp/bats-run-hwX3aY/test_list_file.txt
Mar 06 19:22:14 cirrus-task-5817056873938944 google_metadata_script_runner[1301]: startup-script: 258677 258496 bash /usr/libexec/bats-core/bats-exec-test --dummy-flag /var/tmp/go/src/github.com/containers/buildah/tests/./pull.bats test_pull_with_authfile 728 17 1
Mar 06 19:22:14 cirrus-task-5817056873938944 google_metadata_script_runner[1301]: startup-script: 260749 2
Mar 06 19:22:14 cirrus-task-5817056873938944 google_metadata_script_runner[1301]: startup-script: 262574 258677 /var/tmp/go/src/github.com/containers/buildah/tests/./../bin/buildah --storage-driver vfs --root /tmp/buildah_tests.asmgpy/registry/root --runroot /tmp/buildah_tests.asmgpy/registry/run run --net host registry-working-container /entrypoint.sh /etc/docker/registry/config.yml
Mar 06 19:22:14 cirrus-task-5817056873938944 google_metadata_script_runner[1301]: startup-script: 262622 262574 /var/tmp/go/src/github.com/containers/buildah/tests/./../bin/buildah-in-a-user-namespace --storage-driver vfs --root /tmp/buildah_tests.asmgpy/registry/root --runroot /tmp/buildah_tests.asmgpy/registry/run run --net host registry-working-container /entrypoint.sh /etc/docker/registry/config.yml
Mar 06 19:22:14 cirrus-task-5817056873938944 google_metadata_script_runner[1301]: startup-script: 262943 262622 buildah-oci-runtime
I am suspecting this is happening after me moved to tmpfs I am not sure sure if we saw this before.
A friendly reminder that this issue had no activity for 30 days.
@flouthoc @nalind Do you still see the test hangs?
I haven't seen it in a while but let's keep this open.
This is still failing https://cirrus-ci.com/task/5042262020718592
Still failing https://cirrus-ci.com/task/6130534289506304