opentitan [ci] Introduce a test orchestrator written in bazel

Prototype of the test orchestrator as discussed in the software WG. The implementation follows what we have discussed: opentitan_test calls a ci_orchetrator function with the list of all execution environments. This function then decides which environment(s) to run in CI and returns this list. Then opentitan_test marks every test not in that list as skip_in_ci.

This PR also removes the old complicated logic in the azure pipeline and fpga runner script. This is done by moving the FPGA test job to its own template file to avoid duplication.

Jun 05 '24 13:06 pamaury

How should the orchestrator interact with the broken tag? Should it look for this and try to run the next available environment that isn’t broken?

Jun 05 '24 17:06 jwnrt

How should the orchestrator interact with the broken tag? Should it look for this and try to run the next available environment that isn’t broken?

Yes that's my thinking (although it's not implemented yet). Otherwise it wiuld require a lot of manual fiddling for broken tests. Thanks for raising this point :)

Jun 05 '24 20:06 pamaury

One of the issue I am running into is that it can happen that some jobs end up having no tests to run due to the orchestration. Unfortunately, bazel will error out if the test tag filters produces 0 tests and I don't think this behaviour can be turned off.

Aug 14 '24 12:08 pamaury

Is there a way to ask Bazel for a list of tests before running them? Then we can have a pre-step which exits early if there are none

Aug 14 '24 13:08 jwnrt

Is there a way to ask Bazel for a list of tests before running them? Then we can have a pre-step which exits early if there are none

We can of course replace the tag filters by a query to create a list and then run it. The downside is that the test tag filter syntax is not supported by the query command (as far as I know) so it would require some parsing which is not ideal.

I can see one possible alternative: currently, even with the CI orchestrator, we still split the tests accross different jobs which is error-prone. We could make the CI orchestration a two-step process:

first tag some test with skip_in_ci as we currently do, to avoid running tests that bring no value
then we tag each non-skipped test with a unique tag to identify on which FPGA job it should run. This would effectively move the "job filtering" to the orchestrator which is easier to do in starlark.

The advantage would be that each CI job now becomes trivial to filter: all test with a specific tag, not skipped in ci and not broken. It would also ensure that we don't run duplicate tests accross (or forget some tests) jobs due to logic errors in the filtering.

Aug 14 '24 13:08 pamaury

That sounds good, but I had actually thought that was what your current implementation does. Each test is tagged by its execution environment right? And the Azure jobs run the tests with that exec env tag -broken,-skip_in_ci.

Wouldn’t this still have the same problem though? The job applies these filters but gets zero tests back and the job fails?

Aug 14 '24 13:08 jwnrt

Currently it is not quite the case that each job corresponds to an execution environment because we have manufacturing tests that run in their own job.

To make it clear what I am suggesting, there would be bazel function:

def ci_fpga_job(env,tags):
   """Return the FPGA job on which this tests should run, based on its the execution environment
   and its tags.
  """
   if "cw340_sival_rom_ext" in tag and not "manuf" in tags:
      return "fpga_job_blabla"
   # and so on

and in rules/defs.bzl we could call this function to add the tag to the list. Then the azure file simply specifies the (unique) tag of each fpga job. For example we have some FPGA job filters like cw310_rom_with_fake_keys,cw310_rom_with_real_keys,-manuf.

Or maybe we should just keep one job per exec_env and call it a day :)

Aug 14 '24 13:08 pamaury

What's the reason that we currently run manufacturing tests as separate jobs? Probably just because we want to split into more jobs for parallelism?

Aug 14 '24 13:08 nbdd0121

I think it might be because they have custom execution environments since they run at unusual OTP stages (test_unlocked, rma, etc.). If we mix them into other tests, we might be reprogramming bitstreams more times in the middle of the run which will slow things down slightly.

Aug 14 '24 13:08 jwnrt

Build failure is due to #24395. Will wait until this is fixed.

Aug 23 '24 12:08 pamaury

@cfrantz I have marked the relevant rom_ext e2e tests are broken in this PR. I suggest we mark them as unbroken when we backport your fixes in #24430

Aug 29 '24 14:08 pamaury

Successfully created backport PR for earlgrey_1.0.0:

#24868

Oct 24 '24 10:10 github-actions[bot]