build icon indicating copy to clipboard operation
build copied to clipboard

More granular invalidation

Open davidmorgan opened this issue 10 months ago • 13 comments

For generators using the analyzer, it would be good to rerun them less often, based on a finer-grained analysis of what changed.

davidmorgan avatar Feb 01 '25 09:02 davidmorgan

@rrousselGit asked

I would be interested in making a third-party API for builders to optimize recomputation of code-generators.

For that purpose, could we maybe have a way to opt-out to the default "listen to all imported libraries"? Followed by a listen(library)

there are a few parts to that problem :)

The first is whether we are willing to give generators an API that can return incorrect results. A generator can ask the analyzer about any code; if we don't rerun the generator when something changes, then maybe it asked about some transitive dependency and now we are using incorrect output.

In general I think the answer is "we want it to be easy for a generator to be correct", so it's up to build_runner to handle this, not every author of a generator.

I realize that "and it will be slow" is not an acceptable answer here either :)

For deps that are outside the current library cycle, there is already code to do better invalidation, it's used in google3. I'm not sure if it's wired up for build_runner today. The way it works is that the analyzer reports which inputs were unused and build_runner uses that to do better invalidation. I'll have to check.

For deps that are inside the current library cycle, this doesn't work because the analyzer does read all the files in the library cycle so they are inputs. Here is where we would need new code to support finer-grained invalidation. There is definitely interest in making this work.

But, I'll be looking at the raw performance first as there seem to be easy wins there :)

davidmorgan avatar Feb 01 '25 09:02 davidmorgan

In general I think the answer is "we want it to be easy for a generator to be correct", so it's up to build_runner to handle this, not every author of a generator.

For that purpose, it could be a low-level API that's primarily used by package authors.

If it's obscure enough, it won't get used by the average folks

rrousselGit avatar Feb 01 '25 10:02 rrousselGit

Hmmmmm we could experiment with giving generators access to the "unused inputs" feature.

So the generator would receive a set of the files build_runner thinks it might have used, and the generator can explicitly say "no I didn't use it".

There are some edge cases that are going to be horrible to debug: if a user writes code that introduces a dependency that the generator didn't figure out, then they're going to be staring at a stale build and not know why.

But that does seem much less important than much-too-frequent rebuilds, and it's likely to work just fine for common cases even if the generator is not strictly correct.

This is analogous to how bazel works: it gives every build action the capability to say "here are the inputs I didn't use", https://bazel.build/rules/lib/builtins/actions#parameters_5, the unused_inputs_list. It just trusts the build action, if the action reports wrong unused imports then you just get wrong invalidation.

davidmorgan avatar Feb 01 '25 10:02 davidmorgan

We can already report an asset as unused. But we lack the list of currently used assets, so if a generator wants to report an asset as unused, a bunch of work is necessary.

It's generally much easier to report "I used this" than "I didn't use this"

rrousselGit avatar Feb 01 '25 11:02 rrousselGit

The reason we support "I didn't use this" is because that is what bazel supports - and it also maps well to how we run the compilers.

I agree it is a slower and somewhat weird way of doing things though, so we might want to explore something else.

For deps that are outside the current library cycle, there is already code to do better invalidation, it's used in google3. I'm not sure if it's wired up for build_runner today. The way it works is that the analyzer reports which inputs were unused and build_runner uses that to do better invalidation. I'll have to check.

This tracking/reporting is based on access to summary files which were not read. We don't use summaries in build_runner, and it in general doesn't make sense for the build_runner model as it exists today to use them.

We would need a totally different model of dependency tracking for this (but it would be a great if we had this).

jakemac53 avatar Feb 03 '25 15:02 jakemac53

To be clear: Making a different model of dependency-tracking is exactly what I want to work on.

I have a pretty good idea of what I'd think would both:

  • prevent developers from using data without listening to it
  • make devs listen to only what they use

In fact, I'd like a way to be more granular than the library level.

When you have:

@annotation
class A {}

@annotation
class B{}

Changing A shouldn't invalidate B nor anything that depends purely on B.

But if I want to implement that today, afaik I pretty much have to make my own build_runner.

rrousselGit avatar Feb 03 '25 16:02 rrousselGit

We have much lower hanging fruit than trying to do better sub-library granularity invalidation, I think we should first focus on just trying to get whole libraries to not get invalidated when any transitive dep changes.

The main issue right now is actually just the tracking and invalidation when the number of transitive deps is large, since the transitive deps are tracked per input+builder. Trimming that set to immediate deps only in most cases, or even just making the tracking/hashing more efficient (through transitive deps hashes etc), will give significant wins here without compromising correctness.

jakemac53 avatar Feb 03 '25 17:02 jakemac53

I'm just looking for stuff I can do on my own that would be fun too. Controlling cache invalidation with a good API is my kind of thing ; but I don't see a way to give anything to the ecosystem there with how build is architectured.

I'd rather not fork build just for that.

rrousselGit avatar Feb 03 '25 17:02 rrousselGit

I'm just looking for stuff I can do on my own that would be fun too. Controlling cache invalidation with a good API is my kind of thing ; but I don't see a way to give anything to the ecosystem there with how build is architectured.

Fwiw the Resolvers implementation is something that can be swapped out at the build_runner_core level - but yeah you would lose some of the stuff build_runner does if you just build on top of build_runner_core.

We could potentially add some way of plugging in a custom Resolvers implementation into the generated build script - but I don't know exactly what it would look like and its probably not high priority.

jakemac53 avatar Feb 03 '25 18:02 jakemac53

I'm very much open to changing large things so that they work better, or so contributions are possible.

However it'll take me at least a few weeks to get up to speed, this is the first time I've worked closely in the build_runner codebase, although I've worked with it for a long time :)

So I suggest we revisit the question of what's possible here some time mid February.

Thanks.

davidmorgan avatar Feb 03 '25 18:02 davidmorgan

Will this issue improve the watch command build_runner watch?

amrgetment avatar Feb 08 '25 09:02 amrgetment

Yes, any dev cycle improvements that apply to build will also apply to watch :)

We don't have anything concrete here yet, however. The one I'm working on right now is #3811.

davidmorgan avatar Feb 08 '25 10:02 davidmorgan

The Flutter developers will love you once you finish it ❤

amrgetment avatar Feb 08 '25 12:02 amrgetment