Allow targets that depend on test results
Description of the feature request:
We want to write a rule that consumes test results. Depending on a test target only gives access to the test executable and whatever other providers we implement, but there is no way to depend on the results of running the test.
Feature requests: what underlying problem are you trying to solve with this feature?
We have custom reporting and metrics that we want to extract from completed tests.
- The reporting is a roll-up of many tests so it can't just be added to the end of the testing action itself.
- It also depends on additional metadata generated during the build actions that produce the test executables, so we want to keep them in-sync with the outputs from running the tests.
- The post-processing somewhat expensive so we would like to do it in remote-execution and as a separate target.
What operating system are you running Bazel on?
macOS and linux.
What's the output of bazel info release?
release 3.0.0
Have you found anything relevant by searching the web?
No.
I have seen the built-in bazel coverage support which generates a post-test action all within the Java, but it is not configurable enough to hijack for our needs. Side note: a design to provide generic support for post-test actions would remove the need for a baked-in special-case coverage action and open the door for community support for coverage that meets the needs of the diverse languages and targets that the community supports -- as well as supporting generic post-test reporting like what we require.
The closest thing that I can come up with is to make a bazel run target that pulls in the metadata for the tests from providers and adds them as rundata then looks in the testlogs location to extract the data from the test run, but this is extremely error prone since we either need to:
- assume that the test data is up to date with the tests themselves.
- run the tests explicitly in the
bazel runscript which then needs to know all the correct options to pass along to thebazel testinvocation for the users setup.
Test results and metadata (logs, undeclared outputs) are recorded in the build event output stream. Build event protocol is the API for observing events that happened during a build or test. Have you looked into using that instead?
https://docs.bazel.build/versions/master/build-event-protocol.html
I was aware of the BEP, but I took a closer look just in case. However, I still don't see how it solves my problem.
It may be a more reliable way for a bazel run based script to find the test outputs rather than scraping the bazel info bazel-testlogs location, but it doesn't let me add any actions to the build graph that can consume the test outputs. I still need to do the processing outside of the bazel test operation -- which means that I can't effectively demand more than one, or use remote execution, or cache the results, and I'm still stuck figuring out how to keep the results and metadata in-sync on my own.
We also have a similar request/need.
Our specific use case/problem:
- We use bazel remote execution configured with build without bytes
- We use platform rules and mark rules with
exec_compatible_withdenoting the requirements of the test (eg: GPU, CPU1, CPU8, DISPLAY exc...) - We have some complex tests that run in sequence but have different inputs from different parts of our source tree and the test results could be cached as such (ie: don't need to run test most of the time on certain parts of the test).
- These tests also sometimes have different stages that require GPU and others that use CPU
Detailed example:
We have a DNN model that runs across a large set of data and takes 50% of total time this part of the test requires GPU. After the model is done some metadata for each frame is put into a json file and placed in undeclared_outputs and bundled in outputs.zip.
A second section of code runs that needs this metadata from the model output but also other data from test input data and combines them together. This part of the code only needs CPU and is quite expensive and uses also ~50% of total time.
Example:

There are other examples where we could use this kind of ability, like if a test takes a very long time to execute, we could potentially run two parts of the test independently of each other but not required to run in sequence (eg: if Semantic Segmentation has no state between frames, you could run one video file sharded across N nodes then combine them back together to perform aggregated statistics across all frames). This example is almost achievable using bazel's sharding, but since it cannot collect all the results and process them all at once it's not useful.
One way we can make this work is to make a custom bazel rule that runs these steps as genrules or bzl rules then combine them together with a final test rule, however the problem with this approach is that we'll have to use --build_tag_filters or else we'll end up running this as part of the build steps and we only want to run the expensive parts if a test is requested on the target.
Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 2+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.
@bazelbuild/triage Not stale.
Is there any progress on this? Especially the caching of long-running tests would help us save a lot of compute resources in such a divide-and-conquer scenario.
Another use case I have. We have "tests with canonization": run some code, compare output artifacts produced to the saved copy. If they're equal, the test passes. If not, it fails, but you have an easy way to update the canonical data. Right now this is implemented as the test having two modes: check and canonize. The problem is if the main part that produces data is expensive, I want to cache it and make the second call simply copy the results from cache. So I would have two tests:
- checks that code runs and produces some result
- checks that results are equal to the expected ones. Or updates them in canonization mode.
2 depends on 1
So if on the first run 1 succeeds but 2 fails, I rerun it in canonize mode and 1 is cached, 2 updates the results.
One potential workaround that might be acceptable is to run the test executable as a genrule tool. This allows the downstream dependents to depend on the outputs, and leverages the bazel dependency graph to run everything in parallel, cache outputs, etc.
However, it is unpleasant because the tool gets built under the host configuration (not the test configuration), so you end up building everything twice, and you might actually desire the test configuration to be used to generate the outputs.
@lberki could this be achieved by (ab)using --run_under?
@meisterT that's theoretically possible, although it constrains heavily what one can do with the results because the binary will be run as part of the test action. As such, it'll always be a single binary, one can't express dependency graphs of multiple actions, it will be sharded with the test and will run on the same hardware as the test does. So it's not fantastic.
Unfortunately, I don't have a good answer there: making it possible for regular rules to depend on results of tests would be a pretty difficult refactoring for us and absent overwhelming public support, we have better places to invest our time in.
So I'll (with some sadness) close this issue as "won't fix".
I wrote an initial design proposal last year about this. There have been other things to work on for me so I have not pursued it any further.