please icon indicating copy to clipboard operation
please copied to clipboard

Add support for "cache based" incremental builds

Open jakec-github opened this issue 3 years ago • 5 comments

Many tools in the JS ecosystem (yarn/webpack) do not support true incremental builds. They must run a large builds in a single step. Just like when using Please, first time you run a build everything has to be done from scratch. These tools optimise further builds using a kind of "cache based" technique. Instead of splitting the build, on every run they output a cache artefact. This is then used to reduce subsequent build times.

Please does not support this. A build target (for obvious reasons) cannot depend on itself. When using these tools in please targets they run from scratch each time resulting in painful build times.

Please could support this by exposing a "performance_caches" argument on genrule. When building the target the contents of that directory should be included. If this input is included in the hash of inputs the rule will always get built twice after every change. Once with the old cache (or no cache at all) and once with the updated cache. An alternative approach would be to exclude it from the hash. Users of this feature would have to be confident that the tools output is deterministic regardless of the contents of the cache. But are already expected to be sure that the tools output is deterministic anyway so this isn't a massive leap.

jakec-github avatar Jun 01 '22 16:06 jakec-github

I'm not sure I see how this is different from having two separate build actions - one that outputs the performance_caches and another one that consumes that as an input?

peterebden avatar Jun 07 '22 07:06 peterebden

So I guess the problem here is that some underlying tools only produce this performance cache as part of the build process itself. It can't be produced by a standalone build action. Also, would Please not want to rebuild the performance_caches target every time the inputs change rather than reusing the previous one?

jakec-github avatar Jun 08 '22 13:06 jakec-github

I remember thinking about this sometime ago and it always led to me to a chicken-egg situation, which is related to how Please traditionally works.

ie. input1 -> cmd1 -> out1 and out1 -> cmd2 -> out2

Some JS tools (as Jake mentioned) can't break things down as nicely as we would want. They need to take all required sources in and place the artifacts in some directory (i.e. node_modules/babel/.cache), so that the next time the command is run the tool will look in the cache directory for speeding up the re-build.

How Please works and how the JS ecosystem does things can be quite orthogonal. Let's say that you have a top level rule that runs babel, if there's a BUILD file along the node_modules/babel/.cache path, the .cache directory cannot be changed by top level package (or any different one). I'm sure there are other potential issues with supporting this alongside the current Please incrementality functionality that I'm missing here.

Although this would make things easier for some JS tools, I can't see how it can't be integrated. Perhaps, some of the guys might know whether this is possible or not.

tiagovtristao avatar Jun 13 '22 10:06 tiagovtristao

I don't like the idea of making the build model more complex for this - at present targets get built exactly once (at most once if you include things that aren't required). There would have to be quite a loop of custom code to handle this case.

I still think this could be modelled as two actions:

genrule(
    name = "one"
    outs = ["node_modules/babel/.cache"],
    cmd = "babel stuff",
)

genrule(
    name = "two",
    srcs = [":one"],
    outs = ["something.js"],
    cmd = "more babel stuff, hopefully faster",
)

I think that is pretty much equivalent; in a clean build both have to be run, but on reruns the cache is populated so presumably it is faster.

peterebden avatar Jun 13 '22 11:06 peterebden

I talked to @Tatskaari about this in person. He mentioned that my idea of excluding the performance_caches from the input hash might not be possible if using the remote execution API. At that point we would have to double build these kind of targets after every edit.

In all honesty, I agree that however you do this it would add complexity. These kind of tools just don't work well with artefact based build systems. Hopefully, they will gain more traction in the JS community and people will start designing their tooling with compatibility in mind.

On your suggestion @peterebden I don't think we can improve things that way because action one will run after every change and will do so slowly

jakec-github avatar Jun 17 '22 13:06 jakec-github