bootc icon indicating copy to clipboard operation
bootc copied to clipboard

test flakes tracker

Open cgwalters opened this issue 1 year ago • 14 comments

Parsing layer blob: Broken pipe

stderr: "\e[31mERROR\e[0m Switching: Pulling: Importing: Parsing layer blob sha256:4367367aae6325ce7351edb720e7e6929a7f369205b38fa88d140b7e3d0a274f: Broken pipe (os error 32)"

This one is like my enemy! I have a tracker over here for it https://github.com/coreos/rpm-ostree/issues/4567 too

cgwalters avatar Jun 01 '24 12:06 cgwalters

But anyways I think the larger problem pointed out by the aws error message is the script hardcodes a security group in a specific AZ, when it could really be targeting any AZ right?

There's only one Zone we can ues because RHEL needs internal access to install podman to run bootc install command. IT only configured one subnet in one Zone.

We had get available Zone for non-rhel test https://gitlab.com/fedora/bootc/tests/bootc-workflow-test/-/blob/2bebcdd18f4e0ff9639aff59e2fdfdfcec70f450/playbooks/deploy-aws.yaml#L55.

A few things on this. First it seems like a lot of this script is a basic "provision an ec2 instance" code that could probably be shared and live outside this repo? Maybe we fetch this stuff from a container or a distinct repo?

That's the things I'd like to talk with you on Monday QE sync meeting.

henrywang avatar Jun 01 '24 13:06 henrywang

There's only one Zone we can ues because RHEL needs internal access to install podman to run bootc install command. IT only configured one subnet in one Zone.

OK, got it. Well...per the other discussion, what if we focused only on fedora:40 and centos:stream9 for PR testing by default, and did rhel integration testing both post merge (I'll get the -dev images re-spun up which build relevant things from git main) and also as part of dist-git merges to https://gitlab.com/redhat/centos-stream/rpms/bootc/ ?

cgwalters avatar Jun 01 '24 13:06 cgwalters

OK, got it. Well...per the other discussion, what if we focused only on fedora:40 and centos:stream9 for PR testing by default

I agree.

and did rhel integration testing both post merge (I'll get the -dev images re-spun up which build relevant things from git main) and also as part of dist-git merges to https://gitlab.com/redhat/centos-stream/rpms/bootc/ ?

Just like you mentioned above, rhel-bootc-dev repo can be added just like centos-bootc-dev and -dev image can be saved in gitlab repo (repos under https://gitlab.com/redhat/rhel/bifrost should be private?). I can add test job in this repo without test code added, only run pipeline with https://gitlab.com/fedora/bootc/tests/bootc-workflow-test code. -dev image can be built daily and test will be run daily as well.

I'd not suggest to add testing in https://gitlab.com/redhat/centos-stream/rpms/bootc/ to avoid release block. From my perspective, all tests should be run before release, not on release.

henrywang avatar Jun 01 '24 14:06 henrywang

Recently, let's say last week, this error has been found more times. Automation added 3-times retry in ansible playbook as a workaround. Let's see what happens after retry.

henrywang avatar Jun 08 '24 05:06 henrywang

In a different run, we somehow ended up with

Creating root filesystem (xfs) on device /dev/loop0p2 (size=512M)

Which seems related but different from the other one:

Creating root filesystem (xfs) on device /dev/loop0p1 (size=1M)

Actually, having it be 1M sometimes and 512M others looks very much like we're getting partitions swapped.

cgwalters avatar Jun 24 '24 21:06 cgwalters

The test is facing Installing to filesystem: Creating ostree deployment: Performing deployment: Importing: Parsing layer blob sha256:9536e521dd6b076e09fa076feb4428e4b94e5330c6d6b3ab1e235a54be3d88b7: Failed to invoke skopeo proxy method FinishPipe: remote error: write |1: broken pipe error recently when run bootc install to-existing-root.

henrywang avatar Jul 03 '24 05:07 henrywang

@henrywang anything we can do to fix/improve

[13:43:01] [E] [CentOS-Stream-9:x86_64:/plans/e2e/to-disk] guest provisioning failed: Guest couldn't be provisioned: Artemis resource ended in 'error' state As seen on e.g. https://artifacts.dev.testing-farm.io/4fec6905-15b7-49d6-aff5-2bad9d78a12e/

Having some basically permanently-red CI is a mental overhead to check each time which specific jobs are failing.

cgwalters avatar Jul 16 '24 17:07 cgwalters

Yes, have issue https://issues.redhat.com/browse/TFT-2691 to track.

henrywang avatar Jul 17 '24 14:07 henrywang

Actually, having it be 1M sometimes and 512M others looks very much like we're getting partitions swapped.

I didn't try to stress test this much, but I think https://github.com/containers/bootc/pull/698 is going to help. At the very least if we are still racing somehow, we'll get a more clear error message.

cgwalters avatar Jul 17 '24 15:07 cgwalters

I didn't try to stress test this much, but I think https://github.com/containers/bootc/pull/698 is going to help. At the very least if we are still racing somehow, we'll get a more clear error message.

I think that fixed the install flake, haven't seen it since.

cgwalters avatar Aug 01 '24 00:08 cgwalters

Recently, install to-existing-root test got Installing to filesystem: Creating ostree deployment: Pulling: Importing: Unencapsulating base: failed to invoke method FinishPipe: failed to invoke method FinishPipe: expected 45 bytes in blob, got 139264 error in some tests. I think we should give this error a look. Thanks.

Failed log example:

  • https://artifacts.dev.testing-farm.io/c4f7b9ab-02f7-485f-84dd-9f55559c9129/
  • https://artifacts.dev.testing-farm.io/e9de5ed4-d125-4167-8968-ecfbbbe94072/

henrywang avatar Oct 01 '24 17:10 henrywang

Recently, install to-existing-root test got Installing to filesystem: Creating ostree deployment: Pulling: Importing: Unencapsulating base: failed to invoke method FinishPipe: failed to invoke method FinishPipe: expected 45 bytes in blob, got 139264 error in some tests. I think we should give this error a look. Thanks.

Failed log example:

* https://artifacts.dev.testing-farm.io/c4f7b9ab-02f7-485f-84dd-9f55559c9129/

* https://artifacts.dev.testing-farm.io/e9de5ed4-d125-4167-8968-ecfbbbe94072/

I noted this one over in the ostree-rs-ext tracker, it's likely related to the other similar issues around broken pipes.

jeckersb avatar Oct 01 '24 17:10 jeckersb

This issue looks only exists on bare metal machine (testing farm public ranch runs virtualization test on AWS bare metal instance). I can't find same issue on nested virtualization environment, I mean run same test script. Is that possible this issue is related with disk I/O?

henrywang avatar Oct 02 '24 16:10 henrywang

This issue looks only exists on bare metal machine (testing farm public ranch runs virtualization test on AWS bare metal instance). I can't find same issue on nested virtualization environment, I mean run same test script. Is that possible this issue is related with disk I/O?

Hmm could be. Any idea what kind of storage is used on the baremetal instances?

I'm thinking of trying to reproduce in a virtualized environment by attaching the disk via nbd and then using the spinning filter to simulate a slow disk.

jeckersb avatar Oct 07 '24 20:10 jeckersb

Hi @jeckersb, Do you know any workaround for issue Installing to filesystem: Creating ostree deployment: Pulling: Importing: Unencapsulating base: failed to invoke method FinishPipe: failed to invoke method FinishPipe: expected 45 bytes in blob, got 139264? This error failed a lot in our CI. Thank!

henrywang avatar Oct 28 '24 15:10 henrywang

@henrywang isn't that https://github.com/containers/bootc/issues/509#issuecomment-2419695362 ? Is the input image zstd:chunked? Is it a RHEL10 system?

cgwalters avatar Oct 28 '24 15:10 cgwalters

It's C10S system. Yeah, same thing as RHEL 10. The following workaround might work? Thanks.

if [[ "${REDHAT_VERSION_ID%%.*}" == "10" ]]; then
    sed -i 's/^compression_format = .*/compression_format = "gzip"/' /usr/share/containers/containers.conf
fi

henrywang avatar Oct 28 '24 15:10 henrywang

Yep per https://github.com/containers/bootc/issues/509#issuecomment-2430080290 that's what the new default will be, hopefully soon

cgwalters avatar Oct 28 '24 17:10 cgwalters

I think we're good on this!

cgwalters avatar Nov 05 '24 19:11 cgwalters

Just to cross link since this one shows up at top for a google search right now; https://github.com/bootc-dev/bootc/issues/1204 is still here, or came back

cgwalters avatar Mar 14 '25 17:03 cgwalters