operator-sdk icon indicating copy to clipboard operation
operator-sdk copied to clipboard

Have `operator-sdk scorecard` return different exit codes when it fails a check than when it encounters a fatal error

Open komish opened this issue 4 years ago • 6 comments

Feature Request

Describe the problem you need a feature to resolve.

I'm writing some automation that calls out to operator-sdk scorecard, and have noticed that it will return the same exit code (1) if the tool either encountered a fatal error, or succeeded in execution but the bundle being tested did not pass a test.

Effectively, this means that I must parse stdout/stderr to determine if the tool was able to parse the bundle at all, or encountered a fatal error in the process. I'm assuming that having nothing in stderr implies the tool did not encounter an error in execution, so I can then parse stdout which should contain the results.

As an example, I'm running something like this

# operator-sdk scorecard --output json --selector=suite=olm my-bundle-image-here:latest
*** JSON OUTPUT OMITTED, but the tool completed and 
the output reflected that my bundle failed three tests ***

# echo $?
1

If I ran that with a bogus bundle image, it would also return 1, but I would see something a Fatal log line in stderr.

Describe the solution you'd like.

The easiest thing conceptually is to return a different non-zero exit code if it's a test failure vs. if it's a fatal error. So if any test failed, return 2, vs. returning 1 if the tool encountered a fatal error preventing it from executing.

As a semantic, I do feel like the tool should return 0 even if a test fails (to tool itself worked!). But I also understand how that might make certain automation scenarios a bit trickier (now the developer has to parse the output instead of just relying on the exit code to tell them that something needs their attention). With that in mind, the above solution feels more beneficial.

komish avatar Jul 14 '21 20:07 komish

Technically changing the return code for a failed test or a scorecard command error by default will be a breaking change, so there should be a flag to indicate there should be a different return value. As @jmrodri suggested in the most recent triage meeting, this flag should be named and perhaps accept parameters in a forward-compatible way, i.e. if there are more exit codes we want to add in the future.

I propose --test-exit-code[=2], which is obvious in name and can be modified to accept either an integer or some key=code value later for test- or scenario-specific codes. The default of 2 should be documented. Thoughts @theishshah @jmrodri @asmacdo?

estroz avatar Jul 19 '21 18:07 estroz

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Oct 17 '21 21:10 openshift-bot

/remove-lifecycle stale

Currently, in preflight, we are having to look for "FATA" in stderr to determine whether operator-sdk scorecard returned via 'log.Fatal' or, in the case of just a failed test, os.Exit(1). I am curious how others that rely on the exit code currently make this determination? I ask this because of the "technically a breaking change" comment. It seems to me that these are separate events (an actual error vs. a failed test).

bcrochet avatar Nov 10 '21 15:11 bcrochet

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Feb 08 '22 16:02 openshift-bot

/remove-lifecycle stale This is still an issue. While we have determined that the output differs based on whether there is a TTY or not, it still remains that an error vs a failed test are different outcomes. I would like an explanation why this would be considered a breaking change, other than "that's how it currently is".

bcrochet avatar Feb 10 '22 13:02 bcrochet

/lifecycle frozen

asmacdo avatar Feb 10 '22 17:02 asmacdo