automation_images icon indicating copy to clipboard operation
automation_images copied to clipboard

Skopeo: Backport a patch to avoid a panic when compiled with Go >= 1.22

Open mtrmac opened this issue 1 year ago • 9 comments

Try the “easy” way to fix https://github.com/containers/skopeo/pull/2328 .

Warning: I don’t know what I am doing. I’m hoping this PR, even before merging, generates images which can be used to run Skopeo tests to confirm this works as expected.

I have locally tested the sed command, but nothing else.

mtrmac avatar May 21 '24 18:05 mtrmac

Change LGTM, I can't see why the new images won't build. Oh, looks like I need to give you access to this repo. Once sec...

cevich avatar May 21 '24 19:05 cevich

...access granted. @mtrmac you'll need to re-push unfortunately. There's a validation check that will fail in Cirrus if I simply tell it to "go" as-is.

cevich avatar May 21 '24 19:05 cevich

Thanks, re-pushed.

mtrmac avatar May 21 '24 19:05 mtrmac

rawhide: No match for argument: mlocate

was retired this week (🍷 for a very old project of mine). Do we actually need that?

mtrmac avatar May 21 '24 21:05 mtrmac

Nuke it and we'll find out. Do you know if there's a successor? I'm going to miss it, I use it daily at home.

edsantiago avatar May 21 '24 23:05 edsantiago

Nuke it and we'll find out.

Done.

Do you know if there's a successor? I'm going to miss it, I use it daily at home.

https://fedoraproject.org/wiki/Changes/Plocate_as_the_default_locate_implementation , supposedly since F36.

mtrmac avatar May 22 '24 16:05 mtrmac

Do we actually need that? (mlocate)

Git blame says it came in by 38fa0c65 which is incredibly ancient. However, the origin of that change was exclusive to podman CI. So if removing mlocate still passes podman CI I think we're good.

cevich avatar May 22 '24 18:05 cevich

Note: I re-ran a few container build tasks. Looks like quay or networking flakes.

cevich avatar May 22 '24 18:05 cevich

Test get_ci_vm_entrypoint seems to be failing hard:

rm 'get_ci_vm/good_repo_test/uninit_gcloud.output'
Testing: Verify mock 'gcevm' flavor main() workflow produces expected output
fail - Expected exit-code 0 but received 128 while executing mock_gcevm_workflow (output follows)
Winning lottery-number checksum: 0
gcloud --configuration=automation_images --project=automation_images compute instances create --zone=us-central1-a --image-project=automation_images --image=test-image-name --custom-cpu=0 --custom-memory=0Gb --boot-disk-size=0 --labels=in-use-by=foobar foobar-test-image-name
gcloud --configuration=automation_images --project=automation_images compute ssh --ssh-flag=-o=AddKeysToAgent=yes --force-key-file-overwrite --strict-host-key-checking=no --zone=us-central1-a root@foobar-test-image-name -- true
Cloning into '/tmp/get_ci_vm_ifMxex.tmp/var/tmp/automation_images'...
fatal: detected dubious ownership in repository at '/tmp/cirrus-ci-build/get_ci_vm/good_repo_test/.git'
To add an exception for this directory, call:

	git config --global --add safe.directory /tmp/cirrus-ci-build/get_ci_vm/good_repo_test/.git
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

This is not code touched by this PR.

@cevich any chance you could take a quick look? I'm having trouble understanding. https://github.com//containers/automation_images/blob/b7395d11fee6e977d2256b536a485fdb9811839b/get_ci_vm/test.sh#L276

edsantiago avatar May 22 '24 19:05 edsantiago

This is not code touched by this PR.

Agreed, this is unrelated. Must be the result of a git fix/update. I'll take a look and work on this today, hopefully it's an easy fix.

cevich avatar May 23 '24 14:05 cevich

Well...was an easy fix but hard to find the right place to implement. Local testing shows I found it, so https://github.com/containers/automation_images/pull/356 should fix this.

cevich avatar May 23 '24 18:05 cevich

Done. Okay, rebase this and it should work :crossed_fingers:

cevich avatar May 23 '24 19:05 cevich

Rebased. Thanks @cevich .

mtrmac avatar May 23 '24 19:05 mtrmac

Cirrus CI build successful. Found built image names and IDs:

Stage Image Name IMAGE_SUFFIX
base debian do-not-use
base fedora do-not-use
base fedora-aws do-not-use
base fedora-aws-arm64 do-not-use
base image-builder do-not-use
base prior-fedora do-not-use
cache build-push c20240523t185657z-f40f39d13
cache debian c20240523t185657z-f40f39d13
cache fedora c20240523t185657z-f40f39d13
cache fedora-aws c20240523t185657z-f40f39d13
cache fedora-netavark c20240523t185657z-f40f39d13
cache fedora-netavark-aws-arm64 c20240523t185657z-f40f39d13
cache fedora-podman-aws-arm64 c20240523t185657z-f40f39d13
cache fedora-podman-py c20240523t185657z-f40f39d13
cache prior-fedora c20240523t185657z-f40f39d13
cache rawhide c20240523t185657z-f40f39d13
cache win-server-wsl c20240523t185657z-f40f39d13

github-actions[bot] avatar May 23 '24 20:05 github-actions[bot]

  • 20240523t185657z-f40f39d13
  • 20240513t140131z-f40f39d13 ⇑ (built in #351)
debian prior-fedora fedora fedora-aws rawhide
base 13.2 39-1.5 Generic ? 41-0
kernel 6.8.9-1 6.8.10-200 6.8.5-301 6.8.5-301 6.9.0-64
6.7.12-1 ⇑ 6.8.9-200 ⇑ 6.8.9-300 ⇑ 6.8.9-300 ⇑ 6.9.0-0.rc7.20240510git448b3fe5a0ea.62 ⇑
grub2-common 2.12-2 2.06-120 2.06-121 2.06-121 2.06-121
aardvark-dns 1.4.0-5.1 1.10.0-1 1.10.0-1 1.10.0-1 1.10.0-1
netavark 1.4.0-4.1 1.10.3-1 1.10.3-3 1.10.3-3 1.10.3-3
buildah 1.33.7+ds1-1 1.35.4-1 1.35.4-1 1.35.3-1 1.35.4-1
1.35.3-1 ⇑
conmon 2.1.10+ds1-1+b1 2.1.10-1 2.1.12-1 2.1.10-1 2.1.10-1
2.1.10-1 ⇑
container-selinux ? 2.231.0-1 2.231.0-1 2.231.0-1 2.231.0-1
2.230.0-1 ⇑ 2.230.0-1 ⇑
containers-common ? 1-99 0.58.0-2 0.58.0-2 0.58.0-18
criu 3.17.1-3 3.19-2 3.19-4 3.19-4 3.19-4
crun 1.15-1 1.15-1 1.15-1 1.15-1 1.15-1
1.14.4-1 ⇑ 1.14.4-1 ⇑
docker-ce 5:26.1.3-1~debian.12~bookworm ? ? ? ?
5:26.1.2-1~debian.12~bookworm ⇑
golang 2:1.22~3 1.21.10-1 1.22.3-1 1.22.3-1 1.22.3-1
1.21.9-1 ⇑ 1.22.2-1 ⇑
gvisor-tap-vsock ? 0.7.3-1 0.7.3-2 0.7.3-2 0.7.3-2
nmap-ncat 7.94+git20230807.3be01efb1+dfsg-3+b1 7.95-1 7.95-1 7.95-1 7.95-1
passt 2024-04-26 2024-04-26 2024-05-10 2024-05-10 2024-05-10
2024-04-26 ⇑
podman 4.9.4+ds1-1 4.9.4-1 5.1.0~rc1-1 5.0.3-1 5.0.3-1
5.0.3-1 ⇑ 5.0.2-1 ⇑ 5.0.2-1 ⇑
runc 1.1.12+ds1-2 1.1.12-1 1.1.12-3 1.1.12-3 1.1.12-3
skopeo 1.13.3+ds1-2+b1 1.15.0-1 1.15.1-1 1.15.0-1 1.15.1-1
1.15.0-1 ⇑ 1.15.0-1 ⇑
slirp4netns 1.2.1-1+b1 1.2.2-1 1.2.2-2 1.2.2-2 1.2.2-2
systemd 256~rc3-1 254.12-1 255.6-1 255.6-1 256~rc2-1
255.5-1 ⇑ 254.10-1 ⇑ 255.5-1 ⇑
tar 1.34+dfsg-1.2+deb12u1 1.35-2 1.35-3 1.35-3 1.35-3

edsantiago avatar May 23 '24 20:05 edsantiago

Downstream tests:

  • https://github.com/containers/skopeo/pull/2340 to see whether this fixes the panic
  • https://github.com/containers/podman/pull/22820 to verify removing mlocate does not break tests

mtrmac avatar May 27 '24 14:05 mtrmac

@mtrmac see my comments in your podman PR. Debian is broken, not booting.

CAUSE: new systemd (see above table)

SOLUTION:

  1. wait for #338 to merge (will take a long time because it will need podman-buildah-everything testing and I will be OOTO much of today); or
  2. block systemd upgrade on debian. Example on how to do that: https://github.com//containers/automation_images/blob/afe1ced362caf3ad65fc502da6af3567de0266f3/base_images/debian_base-setup.sh#L46-L57

edsantiago avatar May 28 '24 14:05 edsantiago

@mtrmac Ed's second suggestion (unfortunately) isn't abnormal for this repo. If you're at all uncomfortable making that change, poke me and I'll take care of it for you.

Edit: It looks like your workaorund is at least functional as the https://github.com/containers/skopeo/pull/2340 CI passed.

cevich avatar May 28 '24 15:05 cevich

Cirrus CI build successful. Found built image names and IDs:

Stage Image Name IMAGE_SUFFIX
base debian do-not-use
base fedora do-not-use
base fedora-aws do-not-use
base fedora-aws-arm64 do-not-use
base image-builder do-not-use
base prior-fedora do-not-use
cache build-push c20240528t183210z-f40f39d13
cache debian c20240528t183210z-f40f39d13
cache fedora c20240528t183210z-f40f39d13
cache fedora-aws c20240528t183210z-f40f39d13
cache fedora-netavark c20240528t183210z-f40f39d13
cache fedora-netavark-aws-arm64 c20240528t183210z-f40f39d13
cache fedora-podman-aws-arm64 c20240528t183210z-f40f39d13
cache fedora-podman-py c20240528t183210z-f40f39d13
cache prior-fedora c20240528t183210z-f40f39d13
cache rawhide c20240528t183210z-f40f39d13
cache win-server-wsl c20240528t183210z-f40f39d13

github-actions[bot] avatar May 28 '24 19:05 github-actions[bot]

NO GO. Repeat, NO GO. Something went wrong. systemd is still a bad version:

  • 20240528t183210z-f40f39d13
  • 20240513t140131z-f40f39d13 ⇑ (built in #351) [Ed note: same baseline as above, i.e. what's in podman now]
debian prior-fedora fedora fedora-aws rawhide
systemd 256~rc3-4 254.12-1 255.7-1 255.6-1 256~rc3-1
255.5-1 ⇑ 254.10-1 ⇑ 255.6-1 ⇑ 255.5-1 ⇑

edsantiago avatar May 28 '24 19:05 edsantiago

I can't figure out what went wrong. The log shows your pin file, then:

    debian: Unpacking libnss-resolve:amd64 (256~rc3-4) over (252.22-1~deb12u1) ...

Pinning allows globs, so maybe retry the 256 block with just 256* ? (At least I think it's globs. If it's regexps, that won't work at all). Sorry, I'm stuck.

edsantiago avatar May 28 '24 20:05 edsantiago

I'd like to suggest something helpful, but I'm out of my depth here. Debugging problems in the "base" stage build can be challenging. It's not possible to use hack/get_ci_vm.sh to get anything earlier than a "base" stage VM (which is what is coming in here). The only way I know of is using the GCE Web UI or CLI to create a custom one. In this case Packer is grabbing the latest image by "family", so debian-12 according to the Makefile. In case that helps.

cevich avatar May 29 '24 12:05 cevich

EXECUTIVE DECISION: I am merging this. DO NOT USE THESE VMS!!!!!!!!!!!!!!!!

I am merging because the Skopeo fixes look good, and the mlocate/plocate too, and we REALLY NEED #338 SO PLEASE NO MORE MERGES INTO THIS REPO

edsantiago avatar May 29 '24 14:05 edsantiago

Ah phooey. Too late. Never mind, I'll just bring in the skopeo commit

edsantiago avatar May 29 '24 14:05 edsantiago

Since you're going to be building images in the other PR right away anyway (with a newer IMG_SFX) maybe it's okay to just force-merge this?

cevich avatar May 29 '24 14:05 cevich

Doesn't seem to work. Wants to rerun CI, which is going to fail.

edsantiago avatar May 29 '24 14:05 edsantiago

Dang :cry:

cevich avatar May 29 '24 15:05 cevich

Oh! You could stick '[skip-ci]' in the title, and re-push. That will 100% bypass all of Cirrus-CI.

cevich avatar May 29 '24 15:05 cevich

#338 now contains the wanted fixes from this PR, so I think this can just be closed after #338 merges.

I’m leaving it open for now just to reduce the number of changes in flight.

mtrmac avatar May 29 '24 16:05 mtrmac

#338 was merged, contains the commits we want, and makes the attempts to pin systemd unnecessary.

@edsantiago @cevich thanks!

mtrmac avatar May 29 '24 18:05 mtrmac