dune icon indicating copy to clipboard operation
dune copied to clipboard

After a build, how to programmatically identify which parts of the build failed or succeeded?

Open Khady opened this issue 1 year ago • 3 comments

I guess this is some kind of feature request, but I'm not completely sure of how things should be defined so starting it as a discussion.

I have a large codebase to build. Currently in the CI I have multiple independent jobs running in parallel which will each build one part of the codebase and then do a bunch of additional tasks if the build was a success (linting, deployment, ...). There's some waste in this setup:

  • a good number of libs which are shared are being compiled multiple times
  • dune needs to parse and compute all the rules
  • opam packages must be installed

So I've been looking at running a single job with a single dune command which would do everything in one go. Basically dune build @all @check @runtest @fmt @doc from the root of the repo. The build part works fine.

The first struggle is how to find which binaries/tests were successfully built to know what can be automatically deployed. For binaries I can check if a path is present in _build/default/path/to/the/binary.exe. But it's pretty manual and brittle. And for tests there's nothing in _build which would let me know if they successfully passed.

The current solution is probably to write rules which would generate a file to represent success?

(rule (alias prepare_for_deployment)
  (deps (alias_rec path/to/project/runtest) path/to/a/binary.exe)
  (targets path/to/project/ci-success.txt)
  (action (write-file %{targets} "well done")))

Is there a better way to do it that I could have missed?

The second struggle is reporting at the end of the build. I'd like to be able to report what passed and what failed. People from different teams must be able to easily identify if a failure comes from/affects their code. I could check all the ci-success.txt files and build a report accordingly. But those would only provide a partial coverage of the repository. We might be building code which isn't covered by a prepare_for_deployment alias (on purpose or by accident, it doesn't really matter). One can just read the whole log of the CI build. This is not ideal though, as it can be pretty long and contain a bunch of messages that are unrelated to me. It's too easy to miss some information this way.

I'm thinking there could be some kind of general summary at the end of a build, with the list of all the failures. Or alternatively a sexp output which gives a list of rules which failed, with maybe a backtrace for each that would contain the rules which lead to their execution. It would probably require some postprocessing to get a human readable report.

The third struggle is how to report a failure as early as possible during the build. If my whole CI build takes 20 minutes but an error has been detected after 2, I could immediately mark the build as red and create some kind of notification. It would make iterations much faster. I don't have a great idea on how this should work. Because some kind of external command would have to be launched to interact with the CI environment or to create notifications I'm guessing that this would better be done through the rpc.

Any opinion on this whole issue?

Khady avatar Mar 12 '24 06:03 Khady

Sounds that there could be some display mode that outputs build data in JSON or some other format, so that it can be parsed by tooling more easily, and contains granular info about artifacts, errors, etc. Cargo has an argument --message-format=json, documented here.

jchavarri avatar Mar 12 '24 11:03 jchavarri

The third struggle is how to report a failure as early as possible during the build. If my whole CI build takes 20 minutes but an error has been detected after 2, I could immediately mark the build as red and create some kind of notification. It would make iterations much faster. I don't have a great idea on how this should work. Because some kind of external command would have to be launched to interact with the CI environment or to create notifications I'm guessing that this would better be done through the rpc

I think this is already handled via --stop-on-first-error.

The rest of your issues are probably best addressed by extending dune with RPC. We already have a diagnostics API for reporting errors from the underlying build actions. I suppose we could add an API to determine if a particular alias was built (to determine if a test has passed or failed). I'd suggest that the way to proceed with this is for you to look at the RPC and see what's missing. If you're really tenacious, try to propose ways to extend it to cover your use cases.

rgrinberg avatar Mar 24 '24 21:03 rgrinberg

I think this is already handled via --stop-on-first-error.

I don't think that would work for us. Because we also want to know all the other things that might be broken, not just the first.

Khady avatar Mar 25 '24 02:03 Khady