build icon indicating copy to clipboard operation
build copied to clipboard

Long-term performance: AssetGraph

Open matanlurey opened this issue 7 years ago • 16 comments

I've uploaded the e2e_example/.../asset_graph.json (formatted).

It is about 1.5mb, for what basically is a "Hello World" with DDC. I'm not sure what a typical angular project or something a bit more substantial might look like - maybe not a huge deal?

A couple things I saw looking at the output:

  • We compile a lot of packages that aren't used at runtime. It doesn't seem like a huge deal until you realize we analyze it, create unlinked summaries, create linked summaries, create DDC'd JavaScript, create .errors. outputs, create source maps etc.

Maybe we need a way of excluding tooling-only packages manually (or with heuristics)?

  • Maybe the asset graph should be split into different parts per package, considering that only the nodes in the current package are actually likely to change?

I might misunderstand, though.

  • As @jakemac53 mentions originally in https://github.com/dart-lang/build/issues/41, maybe we just need an output format that is better for fast-incremental writes. I don't have much expertise here.

matanlurey avatar Dec 24 '17 05:12 matanlurey

We do have nodes in the asset graph for all those things, but we don't actually compute modules/summaries/ddc for anything that isn't imported by a real entrypoint - or at least not if you set them as "optional" which should be done for those actions as well as the module action.

jakemac53 avatar Dec 24 '17 05:12 jakemac53

Ah I see. That's good, though it still bloats the graph.

I do see some files in generated I didn't expect to see:

screen shot 2017-12-23 at 9 18 35 pm

... though I guess this all must be based on test/**.dart having CLI-based tests.

matanlurey avatar Dec 24 '17 05:12 matanlurey

Ya you could try creating a build.yaml which overrides the ddc_bootstrap_builder to only run on web entrypoints (that one is what actually causes things to get built).

I am not actually 100% sure we expose that builder separately though from the config.

jakemac53 avatar Dec 24 '17 05:12 jakemac53

(using generate_for)

jakemac53 avatar Dec 24 '17 05:12 jakemac53

we could also try to make it smart about not running on entrypoints that are clearly not web - but that is tricky

jakemac53 avatar Dec 24 '17 05:12 jakemac53

I mean I know we essentially do the same waste on Bazel to an extent, but I think (?) the difference here is the asset graph is more separated than our implementation. Anyway, haven't seen any obvious performance problems yet, though here is a build on my Macbook Pro:

[INFO] BuildDefinition: Reading cached asset graph completed, took 178ms
[INFO] BuildDefinition: Building new asset graph completed, took 471ms

I'm not 100% sure how to read this yet, but 200ms for a read doesn't seem bad considering it's only on a cold start. We should start to get more realistic numbers with stuff like the angular components gallery though.

matanlurey avatar Dec 24 '17 05:12 matanlurey

I'm not 100% sure how to read this yet, but 200ms for a read doesn't seem bad considering it's only on a cold start. We should start to get more realistic numbers with stuff like the angular components gallery though.

Yes - we haven't seen big enough issues with it yet to bother trying to optimize it. As soon as we do it should be relatively straightforward to replace the format as its broken out into separate classes that do the serialization/deserialization.

Some sort of format that allows incremental updating would be ideal - we could even consider using a local database or something. Pretty much everything is on the table.

jakemac53 avatar Dec 24 '17 05:12 jakemac53

[INFO] BuildDefinition: Reading cached asset graph completed, took 178ms [INFO] BuildDefinition: Building new asset graph completed, took 471ms

Were there other logs between that? It should only build a new asset graph if it invalidated the previous one, which it should give you a message about.

jakemac53 avatar Dec 24 '17 05:12 jakemac53

[INFO] ensureBuildScript: Generating build script completed, took 340ms
[WARNING] BuildDefinition: Throwing away cached asset graph because the build actions have changed. This could happen as a result of adding a new dependency, or if you are using a build script which changes the build structure based on command line flags or other configuration.
[INFO] BuildDefinition: Reading cached asset graph completed, took 178ms
[INFO] BuildDefinition: Building new asset graph completed, took 471ms
[INFO] BuildDefinition: Checking for unexpected pre-existing outputs. completed, took 1ms
[INFO] Build: Running build completed, took 19326ms
[INFO] Build: Caching finalized dependency graph completed, took 84ms
[INFO] Build: Succeeded after 19525ms with 984 outputs

matanlurey avatar Dec 24 '17 06:12 matanlurey

Ah looks like the logs are a bit confusing in that case because I think the log about the build actions changing happens while we are reading in the graph, and log the finished line. Probably not a huge issue but it is a bit confusing.

jakemac53 avatar Dec 24 '17 06:12 jakemac53

you could try creating a build.yaml which overrides the ddc_bootstrap_builder to only run on web entrypoints

build_web_compilers|ddc_bootstrap already only runs on ["web/**", "test/**.browser_test.dart"]. Is the problem that there are more *.browser_test.dart than will actually get run?

natebosch avatar Dec 27 '17 17:12 natebosch

I wonder how this has changed with optional builders. @natebosch?

matanlurey avatar Mar 08 '18 04:03 matanlurey

Optional builders still have nodes in the graph - they just might not be built.

jakemac53 avatar Mar 08 '18 15:03 jakemac53

(fwiw, ddc/summaries/modules have always been optional, or at least for a long time)

jakemac53 avatar Mar 08 '18 15:03 jakemac53

Ah my mistake, thanks.

matanlurey avatar Mar 08 '18 15:03 matanlurey

One particular thing we should look at is the memory usage of the AssetGraph. It looks like the VM representation can end up being way bigger than the json representation which is a hint that there is some canonicalization happening in our Json representation (we don't repeat strings) that needs to be happening for our AssetGraph. One thing that is likely happening is we're holding on to multiple copies of duplicate AssetIds (which have duplicate String references).

natebosch avatar May 31 '18 20:05 natebosch

Closing in favour of #3811

I'll be looking at the asset graph next: precisely to deduplicate anything that can be deduplicated and remove anything that can be removed.

davidmorgan avatar Mar 22 '25 10:03 davidmorgan