buildah icon indicating copy to clipboard operation
buildah copied to clipboard

SEGV in bud-multiple-platform-no-run test

Open edsantiago opened this issue 2 years ago • 6 comments

Source: nightly CI run

[+1078s] [not ok 250 bud-multiple-platform-no-run]()
         # (from function `die' in file tests/helpers.bash, line 305,
         #  from function `run_buildah' in file tests/helpers.bash, line 292,
         #  in test file tests/bud.bats, line 3612)
         #   `run_buildah build --signature-policy ${TESTSDIR}/policy.json --jobs=0 --all-platforms --manifest $outputlist -f ${TESTSDIR}/bud/multiarch/Dockerfile.no-run ${TESTSDIR}/bud/multiarch' failed with status 125
         # /var/tmp/go/src/github.com/containers/podman/test-buildah-v1.24.1/tests /var/tmp/go/src/github.com/containers/podman/test-buildah-v1.24.1
         # $ podman-remote build --force-rm=false --layers=false --signature-policy /var/tmp/go/src/github.com/containers/podman/test-buildah-v1.24.1/tests/policy.json --jobs=0 --all-platforms --manifest localhost/testlist -f /var/tmp/go/src/github.com/containers/podman/test-buildah-v1.24.1/tests/bud/multiarch/Dockerfile.no-run /var/tmp/go/src/github.com/containers/podman/test-buildah-v1.24.1/tests/bud/multiarch
         # time="2022-02-18T10:38:26-06:00" level=error msg="While applying layer: ApplyLayer exit status 1 stdout:  stderr: lchown /etc/printcap: no such file or directory"
         # time="2022-02-18T10:38:27-06:00" level=error msg="error unmounting container: error unmounting build container \"\": layer not known"
         # panic: runtime error: invalid memory address or nil pointer dereference
         # [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x14a29f2]
         # 
         # goroutine 88 [running]:
         # github.com/containers/buildah/imagebuildah.(*StageExecutor).Execute(0xc0001dcf70, 0x1fc3ae8, 0xc000c39770, 0xc0011fc690, 0x18, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
         # 	/var/tmp/go/src/github.com/containers/podman/vendor/github.com/containers/buildah/imagebuildah/stage_executor.go:953 +0x2032
         # github.com/containers/buildah/imagebuildah.(*Executor).buildStage(0xc000936800, 0x1fc3ae8, 0xc000c39770, 0xc0011faf90, 0xc0001001e0, 0x2, 0x2, 0x0, 0xc000c1f380, 0x14aac4a, ...)
         # 	/var/tmp/go/src/github.com/containers/podman/vendor/github.com/containers/buildah/imagebuildah/executor.go:484 +0x2cc
         # github.com/containers/buildah/imagebuildah.(*Executor).Build.func3.1(0xc000936800, 0xc000e82610, 0xc000e828be, 0xc0011faf90, 0xc0001001e0, 0x2, 0x2, 0x0, 0xc000dfede0, 0x1fc3ae8, ...)
         # 	/var/tmp/go/src/github.com/containers/podman/vendor/github.com/containers/buildah/imagebuildah/executor.go:679 +0x39a
         # created by github.com/containers/buildah/imagebuildah.(*Executor).Build.func3
         # 	/var/tmp/go/src/github.com/containers/podman/vendor/github.com/containers/buildah/imagebuildah/executor.go:663 +0x26a
         # [linux/s390x] [2/2] STEP 1/2: FROM registry.access.redhat.com/ubi8-micro

See link for full stacktrace, which should make it easy to diagnose.

edsantiago avatar Feb 21 '22 17:02 edsantiago

See also: https://github.com/containers/buildah/issues/3710

cevich avatar Feb 22 '22 16:02 cevich

Probably a dumb question, but I noted that the image we're pulling in the Dockerfile is from DockerHub. Could that be a potential issue? Although I'd think we'd get a "rate Limit" type of error instead.

# A base image that is known to be a manifest list.
FROM docker.io/library/alpine
COPY Dockerfile.no-run /root/
# A different base image that is known to be a manifest list, supporting a
# different but partially-overlapping set of platforms.
FROM registry.access.redhat.com/ubi8-micro
COPY --from=0 /root/Dockerfile.no-run /root/

TomSweeneyRedHat avatar Feb 22 '22 17:02 TomSweeneyRedHat

SEGSV can be removed by doing a nil check but i think this is the root cause and being discussed here -> https://github.com/containers/buildah/issues/3710#issuecomment-1049016014

flouthoc avatar Feb 23 '22 17:02 flouthoc

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Mar 26 '22 00:03 github-actions[bot]

@flouthoc what is going on with this issue?

rhatdan avatar Mar 26 '22 09:03 rhatdan

@rhatdan I lost track of this but picking this again and working on fixing the root here: https://github.com/containers/buildah/issues/3710#issuecomment-1049016014

flouthoc avatar Mar 28 '22 10:03 flouthoc

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Nov 18 '22 00:11 github-actions[bot]

I haven't seen this in recent memory in any of the nightly jobs. Though @edsantiago maybe has actual data from a bigger dataset?

cevich avatar Nov 21 '22 18:11 cevich

https://github.com/containers/buildah/blob/117e97d9c607e3f600027a2b811b026f1ba4e357/tests/bud.bats#L5129-L5132

edsantiago avatar Nov 21 '22 19:11 edsantiago

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Dec 24 '22 00:12 github-actions[bot]

@flouthoc what do you want to do with this one?

rhatdan avatar Jan 03 '23 18:01 rhatdan

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Feb 11 '23 00:02 github-actions[bot]

@edsantiago @flouthoc is this still happening?

rhatdan avatar Feb 19 '23 18:02 rhatdan

I reenabled the disabled code in #4537 a few weeks ago. We'll find out if it's truly fixed or not.

edsantiago avatar Feb 20 '23 11:02 edsantiago