bazel-buildfarm icon indicating copy to clipboard operation
bazel-buildfarm copied to clipboard

[feature] [executor] detect action correctness (aka nondeterminism / cache poisoning)

Open luxe opened this issue 4 years ago • 2 comments

Summary

We extend buildfarm's executor to verify action correctness. Below is a detailed explanation of what the problem is and how it has been solved in the past. Then we propose a server-side solution and demonstrate how it works.

Background Information

File Presence

Bazel guarantees the presence of input and output files for each action execution because these files are statically defined in the build graph. If any input or output file is missing, bazel knows to fail the responsible action. This is the same for REAPI as inputs and outputs are part of the action definition. We are not interested in the presence of these files, but instead verifying their content.

Action Correctness

We define "action correctness" as:

Given the same inputs, the same action will produce the same outputs. Being able to use cached artifacts, because builds are merkle trees (checksums all the way down)

The Problem

Bazel's model requires that the content of each output be deterministic, however this is not easy to verify.
Here is an example of a non-deterministic action:

genrule(
    name="whoops",
    srcs = [],
    outs = ["whoops.txt"],
    cmd = """
    echo "$$(cat /dev/urandom | head -c 64 | base64) $(date)" > $(OUTS)
    """",
)

This genrule is valid according to bazel and it will work in a sandbox. However if you run the action twice, you will get two outputs from the same set of inputs (in this case, the empty input). We want to find and report all of these issues.

Common Sources of Non-Determinism

  • Random numbers
  • Sorting of hashmaps & dictionaries
  • Timestamps
  • Incorrect rule implementations (example: sandboxing disabled and action uses a file that it hasn't declared as input. Actions may run in different order causing different outputs)
  • Sourcing system information (the nvcc compiler use to inject a PID into typenames resulting in different objects)

Signs of Non-Determinism

  • The cache keeps growing
  • Rebuilding the same commit runs actions again (for example: users should expect 100% cache hit building master)

Previous Solutions

Detecting non-determinism has traditionally been done on the client side. See the following BazelCon video which also explains the problem and provides a client solution.
Here is a summary of that solution:

1. Create logs

  1. bazel build --noremote_accept_cached --experimental_execution_log_file=run1.log
  2. bazel clean --expunge
  3. bazel build --noremote_accept_cached --experimental_execution_log_file=run2.log

2. Sort logs

Because bazel's graph is not always evaluated in the same order, the log entries need sorted

bazel-bin/src/tools/execlog/parser \
--log_path run1.log \
--log_path run2.log \
--output_path run1.txt \
--output_path run2.txt

3. Compare logs

A regular file diff might be too noisy, so checking each action programatically for input/output hash differences has worked best.

Problems:

This solution does not scale well. If you needed to test a whole repository in a single invocation you will need to create 2 very large files on disk. Then these files need sorted, and you will need to give the execlog parser lots of JVM ram, but you will still likely run out of memory. You could instead build portions of the repository at a time, but now you need to decide how to run many builds to cover the whole repo which is also a lot of duplicate work. You may also want to run an action more than twice, which grows the number of logs that need compared. We want something at the executor level that does not require external tooling. Something like --runs_per_test=N but for determinism (like: --check_determinism_per_action=N).

Proposed Solution:

We extend buildfarm's executor to verify action correctness. It can be done on a per-action context by re-running the action multiple times. This avoids all the scaling issues on the client side. This problem can now be solved without external tooling and can function like a normal build/test pipeline.

Examples:

Deterministic Output

def deterministic_output(name):
    target_name = name
    outfile = name + "_deterministic.txt"

    native.genrule(
        name = target_name,
        srcs = [],
        outs = [outfile],
        cmd = """
              echo 'hello world' > $(OUTS)
              """,
    )
    
deterministic_output("is_deterministic")

We can confirm that the action is deterministic since it creates the same outputs each time and passes:

bazel test --test_output=streamed \
--noremote_accept_cached \
--nocache_test_results \
--config=remote \
--remote_default_exec_properties='check-determinism=10' \
//code/programs/example_tests/nondeterminism:is_deterministic  

INFO: Found 1 target and 0 test targets...
Target //code/programs/example_tests/nondeterminism:is_deterministic up-to-date:
  bazel-bin/code/programs/example_tests/nondeterminism/is_deterministic_deterministic.txt
INFO: Elapsed time: 8.253s, Critical Path: 8.19s
INFO: 2 processes: 1 internal, 1 remote.
INFO: Build completed successfully, 2 total actions

Non-Deterministic Output

def create_file_with_randomness(name):
    target_name = name
    outfile = name + "_random.txt"

    native.genrule(
        name = target_name,
        srcs = [],
        outs = [outfile],
        cmd = """
              echo $$RANDOM > $(OUTS)
              """,
    )
    
create_file_with_randomness("random_file")

We can confirm that the action is not deterministic since it fails and shows us the different output hashes:

bazel test --test_output=streamed \
--noremote_accept_cached \
--nocache_test_results \
--config=remote \
--remote_default_exec_properties='check-determinism=10' \
//code/programs/example_tests/nondeterminism:random_file

/tmp/worker2/shard/operations/6595539a-b8de-485a-9fb8-dbdc074cf9c9/bazel-out/k8-fastbuild/bin/code/programs/example_tests/nondeterminism/random_file_random.txt has non-deterministic output content:
{hash: "f08fe05461679e8287ea9a730c436130efdcc0d942669c6dffc67405af0f1806"
size_bytes: 5
=1, hash: "99fdb2b05d9b6a7913d0003e2f30e58a266db6fd5cada0df58212ced1dc011a7"
size_bytes: 5
=1, hash: "e09af55a84dd362545a29655895f475b2003496e13ddd6cc7e3dd74e23657262"
size_bytes: 5
=1, hash: "1dadda8d0b1d21ad9a73cfe7f7108a6c3bf8a8488436d27a74241f6fbd5be1fe"
size_bytes: 6
=1, hash: "95203d34940cc692a350d0df31bff44b2314f0630ea6e50b5636cf9feff21555"
size_bytes: 5
=1, hash: "e9e68ac0d161c58bc16a7c35f7668fbbcd1467589fcab43c8f8bdcbb3aa577c2"
size_bytes: 6
=1, hash: "4bb557d72ea3698236dd6b001e6d574ccd553ccf4eb68337e84f46fe3c7db707"
size_bytes: 6
=1, hash: "e6ba136fca30a87917b3f19557111f7ef6416b41c97d70f74ae680ab42745e71"
size_bytes: 6
=1, hash: "58f324108d121cf6db3f474460cddae0deb18ed351e863a00cf32b39c20387cd"
size_bytes: 4
=1, hash: "3a8377da71829f623e127187c0031a6e57759870f52e6fd64ad41c2ea7982ee8"
size_bytes: 6
=1, hash: "5226dc30b097eee3b620f0cf674f7bd59ebfb0f2b2a7b1137efb8e5036cc135a"
size_bytes: 6
=1}
Target //code/programs/example_tests/nondeterminism:random_file failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 13.074s, Critical Path: 13.01s
INFO: 2 processes: 2 internal.
FAILED: Build did NOT complete successfully

We know which output file was nondeterministic and we can also see that its different every time given the 10 hashes. ^
Here is an example where the action only produces different outputs 10% of the time:

def create_file_with_different_data_10_percent_of_time(name):
    target_name = name
    outfile = name + "10_percent_random.txt"

    native.genrule(
        name = target_name,
        srcs = [],
        outs = [outfile],
        cmd = """
              random_number=$$(shuf -i 1-10 -n 1)
              if [ "$$random_number" == "1" ]; then echo "1" > $(OUTS); else echo "not 1" > $(OUTS); fi
              """,
    )

Running 100 times we can see there is only 2 different hashes with the expected frequency:

bazel test --test_output=streamed \
--noremote_accept_cached \
--nocache_test_results \
--config=remote \
--remote_default_exec_properties='check-determinism=100' \
//code/programs/example_tests/nondeterminism:10_percent_different

/tmp/worker/shard/operations/50de3cd7-6e18-4654-9bec-a31584bce496/bazel-out/k8-fastbuild/bin/code/programs/example_tests/nondeterminism/10_percent_different10_percent_random.txt has non-deterministic output content:
{hash: "4355a46b19d348dc2f57c046f8ef63d4538ebb936000f3c9ee954a27460dd865"
size_bytes: 2
=11, hash: "da11c76f32461ea147bb84d8f06fa75c0014383a76fca6995892101f37261583"
size_bytes: 6
=90}
Target //code/programs/example_tests/nondeterminism:10_percent_different failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 7.284s, Critical Path: 7.23s
INFO: 2 processes: 2 internal.
FAILED: Build did NOT complete successfull

luxe avatar Jul 06 '21 19:07 luxe

From a cursory look, this sounds like a great idea. I actually think the implementation is a little narrow-scoped though: with the worker doing all of the logic for this, you're not leveraging much of the parallelism of buildfarm to do this check: I think it should be done in the scheduler, with N multiplied queue entry submissions, so that all the workers have the opportunity to execute and expose it to way more possible sources of determinism pollution (host, configuration, load, hardware, network, clocks and timing, etc), with the scheduler watching for each response and formulating a decorated operation response.

One better is that this really shouldn't just be a client property - buildfarm should do this check intrinsically if configured, to improve health as a feature, rather than just an opt in.

werkt avatar Oct 29 '21 04:10 werkt

thank you for feedback. we had discussed this in our meetup. Do you want to land, or hold off for better implementation?

luxe avatar Jan 27 '22 07:01 luxe