zig build.zig: issue with caching with chained build.step.Run steps

Zig Version

0.13.0-dev.56+956f53beb

Steps to Reproduce and Observed Behavior

Consider build.zig logic where one build.step.Run step takes the output of another such step as an input:

    const step1 = b.addRunArtifact(extracter);
    step1.addFileArg(b.path("src/foo.c"));
    const foo_header = step1.addOutputFileArg("foo.generated.h");

    const step2 = b.addRunArtifact(generator);
    step2.addFileArg(foo_header);
    const rpc_bindings_header = step2.addOutputFileArg("foo.rpc_bindings.h");

    // dummy installs so that output state is visible..
    b.getInstallStep().dependOn(&b.addInstallFileWithDir(foo_header, .header, "foo.generated.h").step);
    b.getInstallStep().dependOn(&b.addInstallFileWithDir(rpc_bindings_header, .header, "foo.rpc_bindings.h").step);

Assume that step2 is much more expensive than step1. If src/foo.c changed we obviously need to re-run step1, but step2 should only need to re-run if the foo_header actually changed as a result.

But any new state of src/foo.c causes both step1 and step2 to run, even when foo.generated.h output is identical. This is because the complete path of the built foo_header depend on the input state of step1. and in turn, the hash calculated by build.step.Run.make in step2 depends not only of the actual contents of addFileArg() file, but also its name which leaks the hash of step1 inputs.

Complete executable test case: https://github.com/bfredl/zig-run4run . The first step extracts only the prototypes of the c functions, so changing implementation_of_foo() to implementation_of_foo2() would illustrate the behavior.

Expected Behavior

When doing an incremental rebuild, if a change to src/foo.c causes step1 to run but still producing the same "foo.generated.h" contents as an earlier rebuild, do not run step2 but use the exiting cached foo.rpc_bindings.h from that earlier build.

I think this could be done in two different ways:

change the full path of foo_header.getPath() to only depend on that file's contents, not all inputs to step1
somehow make the input hash of step2 only be calculated by the files contents and intended file name, not the full prefixed path ( like zig-cache/o/{HASH}/foo.generated.h )

I experimented a bit with the second option but I couldn't make it work myself.

Apr 30 '24 12:04 bfredl

Option (2) (not including the full input path in its hash) seems like it would be backwards incompatible to me: Currently the file path could very well influence the behavior of the run step (f.e. using it as argument to other programs, writing it to a file, etc.). I assume we would want the build.zig file to specify when to trigger this behavior via the API somehow.

Option (1) (moving generated files to a path based on their contents) seems like a more general solution - we would still need a way to find a file based on its inputs(' hash), f.e. via (a file providing the content's hash, which can act like) a symlink.

Just FYI, I think providing a file's content to a run step via stdin already uses the file content's hash, which might be an appropriate workaround for you in the meantime.

Apr 30 '24 18:04 rohlem

Unfortunately step2.setStdIn(.{ .lazy_path = foo_header }) does not work as a workaround. it still adds the full generated path of foo_header to the hash, same as for addFileArg(foo_header)

May 04 '24 08:05 bfredl