podman icon indicating copy to clipboard operation
podman copied to clipboard

quadlet: Add a network requirement on .image and .containers units

Open jbtrystram opened this issue 1 year ago • 18 comments

If a container unit starts on boot with a dependency on default.target the image unit may start too soon, before network is ready. This cause the unit to fail to pull the image. Add a dependency on network-online.target to make sure image pulls don't fail.

Does this PR introduce a user-facing change?

yes

Image oneshot units generated through quadlet will now have a dependency on `network-online.target`

Fixes https://github.com/containers/podman/issues/21873

jbtrystram avatar Mar 15 '24 14:03 jbtrystram

Cockpit tests failed for commit aa7b60454692c7f0b502ce3c9d749a7ece5ce3aa. @martinpitt, @jelly, @mvollmer please check.

Cockpit tests failed for commit d2d5b1e7755f736dafcc64ab2b04bde5696dba71. @martinpitt, @jelly, @mvollmer please check.

LGTM @ygalblum @alexlarsson @mheon PTAL

rhatdan avatar Mar 15 '24 21:03 rhatdan

The specific code LGTM. But, it addresses only the .image files. Doesn't the same rational apply for .container and .kube files?

In addition, while the documentation states that setting After= will remove this dependency, there is no test to verify it. In order for it to work, Quadlet needs to make sure the keys set by the user are placed after the ones it sets on its own. While this might be the case now, without a test, there is no guaranty that this behavior does not break in the future.

ygalblum avatar Mar 17 '24 07:03 ygalblum

@ygalblum I am looking at adding another test to verify that After= removes the dependency, but I can't see any assertion helper to use, so I am trying to add one.

I started to add assert-key-is-last which would call UnitFile.LookupLast but then I need the key to have no arguments. The assertion then would be assert-key-is-last-and-empty but then it becomes awfully specific ! Do you have any other idea ? It's my first contrib to podman so I don't know my way around the codebase :)

jbtrystram avatar Mar 19 '24 19:03 jbtrystram

@jbtrystram how about assert-last-key-is-regex and pass ^$?

ygalblum avatar Mar 20 '24 07:03 ygalblum

@ygalblum neat suggestion, thanks ! I added that. I am having synchronisation issues when running make test locally though, let's see if the CI runs the test !

jbtrystram avatar Mar 20 '24 08:03 jbtrystram

Cockpit tests failed for commit 2b45a19b1c52b8765721f07c11be5a2380fc6c39. @martinpitt, @jelly, @mvollmer please check.

Cockpit tests failed for commit c26db33125b2b5f044d4d016d3642d3f7508b0b4. @martinpitt, @jelly, @mvollmer please check.

added a comment to the issue but linking it here:

https://github.com/containers/podman/issues/21873#issuecomment-2025643611

dustymabe avatar Mar 28 '24 16:03 dustymabe

@jbtrystram What is going on with this PR?

rhatdan avatar Mar 29 '24 11:03 rhatdan

@rhatdan This is something i am working on my spare time and i hadn't had a lot of that lately, sorry. I'll follow up soon

jbtrystram avatar Mar 29 '24 11:03 jbtrystram

Ephemeral COPR build failed. @containers/packit-build please check.

packit fails with : packit.exceptions.PackitSRPMException: Preparation of the repository for creation of an SRPM failed: Reference at 'HEAD' does not exist

jbtrystram avatar Apr 23 '24 12:04 jbtrystram

@ygalblum I picked that up and fixed it. The quadlet.go code behave as expected, as quadlet --dryrun on unit-after-override.image gives :

---test.service---
## assert-last-key-is-regex "Unit" "After" "^$"

[Unit]
Wants=network-online.target
After=network-online.target
After=
SourcePath=/etc/containers/systemd/test.container
RequiresMountsFor=%t/containers

.....

However the test code with that assert-last-key-is-regex does not work yet. I'll keep working on it but still pushed, if you have some suggestions in the meanwhile i'll take them :)

I tried to add some debug statements in the test code but they're not printed when running make localintegration FOCUS_FILE=quadlet_test.go

jbtrystram avatar Apr 24 '24 13:04 jbtrystram

Another two tests are failing since they already check the After key. The check assert-key-is expects all the values in the correct order. So you need to change the following: test/e2e/quadlet/mount.container, line 13:

## assert-key-is "Unit" "After" "network-online.target" "vol2-volume.service"

test/e2e/quadlet/network.quadlet.container, line 3:

## assert-key-is "Unit" "After" "network-online.target" "basic-network.service"

ygalblum avatar Apr 25 '24 11:04 ygalblum

Thanks @ygalblum for the help. I squashed the commits and rebased. The failing tests seem unrelated

jbtrystram avatar Apr 25 '24 18:04 jbtrystram

/retest

jbtrystram avatar May 06 '24 18:05 jbtrystram

@jbtrystram: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci[bot] avatar May 06 '24 18:05 openshift-ci[bot]

/ok-to-test

ygalblum avatar May 07 '24 06:05 ygalblum

/retest

ygalblum avatar May 07 '24 06:05 ygalblum

@ygalblum I rebased this to see if that help with the tests.

jbtrystram avatar May 21 '24 14:05 jbtrystram

test_fixes.txt @jbtrystram The tests fail because they were checking the values of the After key and this PR changes them (as it automatically adds network-online.target. I've attached a patch file you can use to fix the tests. When in the Podman base directory run:

patch -p1 < test_fixes.txt

ygalblum avatar May 22 '24 07:05 ygalblum

@ygalblum I got confused by your patch because I am 100% sure I already fixed those tests earlier. I think I messed up the rebase yesterday and pushed an out of date branch

jbtrystram avatar May 22 '24 08:05 jbtrystram

yeah, that's better. I found my previous work on another machine. Sorry for the trouble @ygalblum and thanks for your patience

The test failure is related to buildah :

[+0784s] not ok 119 bud and test --unsetlabel
         # (from function `assert' in file tests/[helpers.bash, line 521](https://github.com/containers/podman/blob/401a1ecd8ecd51efde63fc36d6837df307f173f7/test/system/helpers.bash#L521),
         #  from function `expect_output' in file tests/[helpers.bash, line 548](https://github.com/containers/podman/blob/401a1ecd8ecd51efde63fc36d6837df307f173f7/test/system/helpers.bash#L548),
         #  in test file tests/[bud.bats, line 2284](https://github.com/containers/podman/blob/401a1ecd8ecd51efde63fc36d6837df307f173f7/test/system/bud.bats#L2284))
         #   `expect_output "fedora" "name must be fedora from base image"' failed
         # /var/tmp/go/src/github.com/containers/podman/test-buildah-v1.35.1-0.20240412112838-e393e57728f5/tests /var/tmp/go/src/github.com/containers/podman/test-buildah-v1.35.1-0.20240412112838-e393e57728f5
         # # [checking for: registry.fedoraproject.org/fedora-minimal]
         # # [copy docker://registry.fedoraproject.org/fedora-minimal dir:/var/tmp/buildah-image-cache.3740/registry.fedoraproject.org-fedora-minimal-]
         # Getting image source signatures
         # Copying blob sha256:f8af2181b761a975e3199969f7c3e60950837cdae4d7fb861fc40bacc53c566f
         # Copying config sha256:0eb7b1a2db1ea0de4fb13b46eeabe19e6e3ccc30be8164b5c46e8a043c428e0f
         # Writing manifest to image destination
         # # [copy dir:/var/tmp/buildah-image-cache.3740/registry.fedoraproject.org-fedora-minimal- containers-storage:registry.fedoraproject.org/fedora-minimal]
         # Getting image source signatures
         # Copying blob sha256:f8af2181b761a975e3199969f7c3e60950837cdae4d7fb861fc40bacc53c566f
         # Copying config sha256:0eb7b1a2db1ea0de4fb13b46eeabe19e6e3ccc30be8164b5c46e8a043c428e0f
         # Writing manifest to image destination
         # # buildah --version
         # buildah version 1.36.0-dev (image-spec 1.1.0, runtime-spec 1.1.0)
         # # podman-remote build --force-rm=false --layers=false --signature-policy /var/tmp/go/src/github.com/containers/podman/test-buildah-v1.35.1-0.20240412112838-e393e57728f5/tests/policy.json -t exp -f /var/tmp/go/src/github.com/containers/podman/test-buildah-v1.35.1-0.20240412112838-e393e57728f5/tests/bud/base-with-labels/Containerfile
         # STEP 1/2: FROM registry.fedoraproject.org/fedora-minimal
         # STEP 2/2: RUN echo hello
         # hello
         # COMMIT exp
         # Getting image source signatures
         # Copying blob sha256:5bb445179d0077f481b41ef6bf437894ed7ed34f48f475c496daaa08cf640260
         # Copying blob sha256:1b9e48ebc05e985b80c66aaa9cba990df285deac3a0ea3a874f85d1a54cc1c01
         # Copying config sha256:1e53504e3490785ef0a01ce52855ac4f2f35ea3b741a21b6c1fbf3b569613e6c
         # Writing manifest to image destination
         # --> 1e53504e3490
         # Successfully tagged localhost/exp:latest
         # 1e53504e3490785ef0a01ce52855ac4f2f35ea3b741a21b6c1fbf3b569613e6c
         # # buildah inspect --format {{ index .Docker.Config.Labels "license"}} exp
         # MIT
         # # buildah inspect --format {{ index .Docker.Config.Labels "name"}} exp
         # fedora-minimal
         # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
         # #|     FAIL: name must be fedora from base image
         # #| expected: 'fedora'
         # #|   actual: 'fedora-minimal'
         # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
         # teardown: stopping podman server 49943
         # /var/tmp/go/src/github.com/containers/podman/test-buildah-v1.35.1-0.20240412112838-e393e57728f5

[...]
  Error running buildah bud tests under podman.
         
         Is it possible that your PR breaks podman build in some way? Please
         review the test failure and double-check your changes.
         
         
         Please see test/buildah-bud/README.md for advice

jbtrystram avatar May 22 '24 11:05 jbtrystram

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jbtrystram, ygalblum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar May 22 '24 12:05 openshift-ci[bot]

I don't think this failure have anything to do with my PR :

# podman run --name o8zHCMmnW1 --health-cmd=touch /terminate --sdnotify=healthy quay.io/libpod/testimage:20240123 sh -c while test \! -e /terminate; do sleep 0.1; done; echo finished
<+574ms> # finished
         # Error: container is stopped
<+004ms> # [ rc=126 (** EXPECTED 0 **) ]
         # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
         # #| FAIL: exit code is 126; expected 0
         # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

https://api.cirrus-ci.com/v1/artifact/task/5162052191256576/html/sys-podman-fedora-40-aarch64-root-host-sqlite.log.html

jbtrystram avatar May 22 '24 14:05 jbtrystram

@containers/podman-maintainers PTAL

ygalblum avatar May 22 '24 15:05 ygalblum

Known flake, restarted

mheon avatar May 22 '24 17:05 mheon

everything passed ! Can I get a LGTM ?

jbtrystram avatar May 22 '24 18:05 jbtrystram