Make the result of an empty plan configurable
When there is no test discovered (and executed) in any of the enabled plans which tmt run detects, it reports 3 as the exit code. This is then handled as an error when executed in Testing Farm. Some users would like to make it configurable how the empty plan scenario is handled.
The way proposed during the last discussion was to introduce a new option under the plan. It could look like this:
/plan:
discover:
how: fmf
execute:
how: tmt
empty-plan-result: pass / warn / fail / error / skip
Possibly the new empty-plan-result key could be stored under some of the steps, perhaps execute?
/plan:
discover:
how: fmf
execute:
how: tmt
empty-plan-result: pass / warn / fail / error / skip
The thing is that actually we don't have any plan result implemented. There are only results.yaml and based on them the appropriate exit code is reported. So should it be rather something like:
empty-plan-exit-code: 0
Another approach could be that the execute step would create a dummy result with the provided value. This would also suggest that the new option should belong under the execute step. Thoughts?
Related issue: https://issues.redhat.com/browse/TFT-2065
@fhrdina1, @kkaarreell, @lukaszachy, @thrix, @happz, @Athwale, @psklenar, here's summary from our discussion. Please have a look.
The thing is that actually we don't have any plan result implemented. There are only
results.yamland based on them the appropriate exit code is reported.
That is slightly inaccurate, I'd say: results.yaml is created per plan, so in this sense, we do have "plan result" - we don't have "plan exit code" because the eventual exit code is decided based on results collected from all plans of a run. And the "plan result" aka "plan exit code" derived from per-plan results never materializes.
This is mostly fine in TF, where tmt run is created for each discovered non-empty plan, and then "all results" are suddenly "plan results". So here a flag would work seamlessly. But then runs with multiple plans arrive on the scene. How about turning the piece of code that decided the exit code (tmt.base.Run.finish()) into something like this:
results = [
result
for plan in self.plans
for result in plan.execute.results()]
...
raise SystemExit(tmt.result.results_to_exit_code(results))
# could become:
exit_codes = [
tmt.result.results_to_exit_code(plan.execute.results())
for plan in self.plans]
...
raise SystemExit(max(exit_codes))
And then we can change the loop and start involving the plan's flag to override that plan's evaluation of its results. This would materialize "plan result", in the sense of "an exit code a plan would produce if there were no other plans that might make the result even worse".
So should it be rather something like:
empty-plan-exit-code: 0
All in all, both approaches, a flag to turn "empty" into "success" or a flag to turn "empty" into a given exit code, seem fine to me.
What happens when there are two plans in a run, one empty and one with failing tests - the usual "the worst wins" approach?
I can also see this fitting in the discover definition. I guess there would be some ambiguity when defining multiple deiscover steps.
For reference, ctest uses language like ctest --no-tests={ignore,error}
That is slightly inaccurate, I'd say:
results.yamlis created per plan, so in this sense, we do have "plan result" - we don't have "plan exit code" because the eventual exit code is decided based on results collected from all plans of a run. And the "plan result" aka "plan exit code" derived from per-plan results never materializes.
I meant that currently we even do not define any final result value like pass, fail, warn for the plan, results.yaml is just a pile of individual test results.
And then we can change the loop and start involving the plan's flag to override that plan's evaluation of its results. This would materialize "plan result", in the sense of "an exit code a plan would produce if there were no other plans that might make the result even worse".
Yes, "counting" the final exit code for each plan makes sense. On the other hand, I'm thinking, whether it would not be more natural for user to state the expected empty plan result as pass, fail... instead of specifying the exit code which usually is not that interesting from the user point of view.
What happens when there are two plans in a run, one empty and one with failing tests - the usual "the worst wins" approach?
If we provide overal result for each plan then we can do this worst-wins math, but for exit codes it's hard to say what is actually "worst". For example, how do we compare what's better, 3 (no test results) or 4 (all tests skipped)?
I can also see this fitting in the
discoverdefinition. I guess there would be some ambiguity when defining multiple deiscover steps.
Interesting thought, storing the empty-plan-result flag in the discover phase could also make sense. But do we really need that level of granularity? Any tangible/specific use case on your mind?
I can also see this fitting in the
discoverdefinition. I guess there would be some ambiguity when defining multiple deiscover steps.Interesting thought, storing the empty-plan-result flag in the
discoverphase could also make sense. But do we really need that level of granularity? Any tangible/specific use case on your mind?
I can see it go either way:
- it could help early exit if the discover phase returns empty so that prepare is not executed
- could be a more natural flow when you consider plugins for
discover, e.g.:/plan: discover: - how: cmake no_tests: pass - how: fmf path: path/to/additional_static_tests no_tests: fail - but then if there are multiple
discoversteps, then each item must define their parameter. https://github.com/teemtee/fmf/issues/228 might be useful for this though. Although, why not have it in both, whereexecutewould treat alldiscoverphase as a whole.
On the other hand, I'm thinking, whether it would not be more natural for user to state the expected empty plan result as
pass,fail... instead of specifying the exit code which usually is not that interesting from the user point of view.
+1. It would follow the approach used by result in test specification, using words instead of numbers.
What happens when there are two plans in a run, one empty and one with failing tests - the usual "the worst wins" approach?
If we provide overal
resultfor each plan then we can do this worst-wins math, but for exit codes it's hard to say what is actually "worst". For example, how do we compare what's better, 3 (no test results) or 4 (all tests skipped)?
Yep, that's the fun part :) We have 1 exit code and N plans, so eventually, we will need to reduce them somehow into a single number. Maybe both 3 and 4 have the same level of "badness", who knows...
This issue has been escalated as part of TFT-2065. Not blocking the gating efforts as for them missing test coverage should not be ignored but still useful for some use cases like tier testing. Proposing as a should-have for 1.36.
moved to 1.43 as requested by @psss , enotime for 1.42 unfortunately