cibuildwheel (Optionally) let build pipeline to continue even if certain build or test fails

Description

Current behavior

The GitHub Actions pipeline can be configured to build/test multiple configurations of Python versions, OS'es, and architectures. The job step pypa/cibuildwheel will stop immediately (and raise an error) if the build or test fails on a certain configuration. All the wheels files which are built before the error can still be kept in the destination folder.

Desired feature

Sometimes in the development process we would like let the pipeline to try to build ALL configurations regardless if certain configuration fails. In this way we can achieve two things:

Check the potential build error on all configurations.
Download the wheel files (via artifact) which are successfully built (but may potentially fails the test), and install and test them locally. This is important because many co-workers/testers do not have a local build environment. They may want to download a pre-built test version of wheel file to do their local testing.

I can set continue-on-error for the job step, see here. However, the job step will still skipping the remaining configurations. I think there can be an option for pypa/cibuildwheel to continue to build all configurations.

Build log

No response

CI config

No response

Mar 24 '22 08:03 TonyXiang8787

Bumping this issue, since there hasn't been a response from maintainers. I am interested in having this feature, and would be willing to contribute a PR if maintainers want this feature.

Jun 07 '23 03:06 reynoldsnlp

I see some benefits from this, but it sometimes happens that the build fails because of things unrelated to cibuildwheel (like pypi failure).

It is especially dangerous in the context of the release process when part of the wheels may not be uploaded to PyPi and it could be unnoticed for a long time.

On the other hand enabling such a feature for dev build may allow knowing which builds fails/passs.

So I could agree for such a feature, but if even a single build fails the cibuildwheel should end with non-zero exit code (each CI that I check allows to ignore the exit code from the previous steep). Also, it should allow setting max fail count.

Jun 07 '23 08:06 Czaki

@Czaki I agree that this should only effect cibuildwheel's behavior; downstream steps should still be blocked by a non-zero exit status. I like the idea of a max fail count. It seems like the simplest mechanism would be an environment variable (something like CIBW_ALLOW_FAILS) with a default value of 0. This number would be used independently on each platform/image, so if you set it to 2 and run on three platforms/images, the maximum number of allowed failures would be 6. Platform- and image-specific values could be set with platform/image suffixes, such as CIBW_ALLOW_FAILS_LINUX or CIBW_ALLOW_FAILS_MANYLINUX_X86_64.

If someone wants to allow everything to fail, they can simply set the number higher than the number of possible builds per platform/image (currently 17, I think). The docs could recommend using an impossibly high number as a substitute for infinity.

Jun 07 '23 14:06 reynoldsnlp

I think this is a good idea for an option. off the top of my head, CIBW_CONTINUE_ON_FAILURE: false|true|<number> makes sense. Definitely agree that we should return a failure exit code even if we continued.

In terms of code, it's gonna add yet another level of indentation to our build scripts, which is annoying. But that shouldn't stop us. Also, we might have to take a look at how errors are handled in the build scripts, to ensure proper exceptions are raised rather than os.exit or SystemExit.

Jun 07 '23 14:06 joerick

The question is if treated differently, fail in the test and fail in the build process.

In terms of code, it's gonna add yet another level of indentation to our build scripts, which is annoying

Split into functions? Or do you think that maintainability may be a problem?

Jun 07 '23 15:06 Czaki

The question is if treated differently, fail in the test and fail in the build process.

I think for simplicity, we shouldn't treat these differently, wheels that fail the testing phase should not be copied to output but we can allow the other builds to continue.

In terms of code, it's gonna add yet another level of indentation to our build scripts, which is annoying

Split into functions? Or do you think that maintainability may be a problem?

Yeah can try - a function that builds one build identifier might make sense. Then we could loop outside that and catch errors outside that too.

Jun 07 '23 15:06 joerick

wheels that fail the testing phase should not be copied to output

The OP was specifically asking to be able to download wheels that fail tests, and I have wanted to do the same.

Jun 07 '23 15:06 reynoldsnlp

I'm a fan of using exceptiongroups for this. :)

I think starting with continue-on-failure true/false would be fine, having a count seems unnecessary unless it would report "passed" if that number of failures happened. And you'd have no control over which jobs were allowed to fail. If we want that sort of control, having an allowed-fail list of identifiers (following the syntax of build/skip) would be better, IMO. But I think the main reason for doing this (getting a build output if the test fails for some platform where you've verified the failures aren't that important) is already covered by the test skipping mechanism.

Yes, I'd agree that you want test failures to still provide wheels, and cibuildwheel still reports the error at the end and displays all the failures from the ExceptionGroup.

Jun 07 '23 15:06 henryiii

Also, I think we should wait to start working on this until after #1456 is in.

Jun 07 '23 15:06 henryiii

From my experience, I prefer to allow a set maximum failure number of steeps. Sometimes You know that some python version is flaky, but waiting on all builds to see that all of them fail is a waste of time.

Jun 07 '23 15:06 Czaki

wheels that fail the testing phase should not be copied to output

The OP was specifically asking to be able to download wheels that fail tests, and I have wanted to do the same.

Ah, hm, interesting. That might complicate implementation a bit, but seems reasonable enough.

Jun 07 '23 16:06 joerick

It happens that when I play with debug mac build I need to disable test to obtain wheels and inspect binaries (using otool) to understand where is the problem). So I understand why they want to have such a feature.

Jun 07 '23 16:06 Czaki

I prefer to allow a set maximum failure number of steeps. Sometimes You know that some python version is flaky, but waiting on all builds to see that all of them fail is a waste of time.

That would then unexpectedly pass if a flaky job passed, but a job you weren't expecting to fail failed. The total number would still be below your threshold. I'd much rather a way to specify flaky jobs - an "allowed failure" list, and that would be a future feature. Having it build as many as possible and still report an error code and would allow you to process this yourself - if it fails, you can count the produced wheels or check to see which ones are missing and decide what to do from there.

I think we should keep a simple continue-on-failure true/false, with a possibility of adding allowed-fail later if it still seems to be needed. I think continue-on-failure would be easy to understand and pretty simple to implement.

Jun 22 '23 14:06 henryiii