sdk Track macro dependencies at the library level for improved invalidation

The Problem

Today whenever a library changes (it's API at least), all macros on all libraries that depend on that library (transitively) get re-ran.

In pathological cases, this can result in incremental analysis times which are O(# affected libraries * average time to run a macros per library). We can demonstrate today that if the number of affected macros reaches into the hundreds, this is noticeably slow (multiple seconds).

The need is most pressing here for the analyzer, since it will have to deal with this situation any time a character is typed in any file. Eventually, the CFE will also want to implement this for the incremental compiler, to improve hot reload performance.

Solution

We should track the dependencies of macro applications and only re-execute them if those dependencies change.

To start, "dependency" here should just be tracked at the library granularity. This should be sufficient to get what we want, without adding a huge amount of memory overhead. If desired we can later investigate more fine grained dependency tracking.

Implementation Details

For most API requests this should be relatively easy to implement - all Declaration objects sent to macros already have a Library associated with them. We should track all of those libraries in a set (this could even be done in shared code potentially). Once a macro is done, that set of libraries should be saved in such a way it can be linked to the specific macro application, via expando or map etc.

Whenever a library is invalidated by the API signature of a dependency changing, it should re-use the macro results it got previously, except for the results that depended on the library which caused the invalidation. Macros in the current library will always re-run, because they always are provided some declaration from the current library, and thus always depend on it.

The end goal of this should be that in most cases only the macros in the file which was edited are re-ran, or maybe a handful of others, but not all macros that might have been able to introspect on the changed library.

Harder cases

StaticType

Macros can ask for a StaticType, which encapsulates knowledge about the entire type hierarchy. We will need to record the libraries for all the types in that hierarchy when returning static types.

If we want the library tracking to be handled in shared code, we would need to add an Iterable<Library> field to the StaticTypeImpl so that we know which libraries a static type depends on.

Constant evaluation

If/when we enable constant evaluation, we will also need to track the library dependencies of that, for any const identifiers that were read.

Identifiers in Code objects

Identifiers themselves also could change which library they came from, which would affect the imports in the generated macro augmentations.

For all calls to resolveDeclaration and resolveIdentifier in the augmentation library code, we should track the Uris of the resolved identifiers as well, and add those to the set of dependencies.

OmittedTypeAnnotation

When types are omitted, they might be inferred. We handle that in the macro apis through the special OmittedTypeAnnotation object, which we later allow you to infer through a separate call.

When you do that inference, we need to add the proper dependencies for which libraries that inference depended on, which could get complicated.

New types being introduced in immediate imports

If a new type is added to the import scope, it could change how existing type annotations resolved, without a change to the libraries those types came from.

This usually would result in an error that anyways has to be resolved, but for core types it would not, and there might also be other situations like that, so we need to think carefully about it.

We could say that everything in the import scope is an implicit dependency, as an example (so all immediate imports and their transitive exports).

May 20 '24 17:05 jakemac53

cc @davidmorgan @scheglov

May 20 '24 17:05 jakemac53

There are a few complications, at least in the analyzer.

When we ask

      var methods = await builder.methodsOf(superclassDeclaration);
      methods[0].returnType;

...the method could have its returnType inferred from a superclass declared in another library. We probably could say that a method has some omitted types, it depends on libraries that declare overridden methods.

When we resolve an identifier, e.g. await builder.typeDeclarationOf(superclass.identifier), we depend not only on the library where this identifier was found. We also depend on the fact that it was not found in other imported libraries. Otherwise it would be a conflict. Not sure yet how to solve this.
In the analyzer we store not separate library summaries, but whole library cycles. And we store these summaries into file system cache using API signature of all dependencies as the key. It looks that we would need to store separately macro application results using a more relaxed key externally, e.g. the combination of just URIs of the library cycle (as opposite to API signatures); and internally we will have a map with per-library key to all macros in the library result (practically test.macro.dart code, and maybe diagnostics - more complicated).
To make it explicit. In the analyzer we will still relink the whole library cycle, using all dependencies. Making this portion of the analyzer incremental is a separate interesting project that might take non-trivial time. We will only try to reuse macro application results when we think it is safe to do.

May 20 '24 18:05 scheglov

...the method could have its returnType inferred from a superclass declared in another library. We probably could say that a method has some omitted types, it depends on libraries that declare overridden methods.

This should be an OmittedTypeAnnotation - we do have an API to infer those types but it is explicit, and we could add in the dependency on the root library at that point? This can be added to the harder cases list :).

2. We also depend on the fact that it was not found in other imported libraries. Otherwise it would be a conflict. Not sure yet how to solve this.

Interesting point, this could be a challenge, but it would anyways result in an error in the user code right? So maybe we don't invalidate the macro, but the user does have to do something to resolve it either way. So it might be OK.

3. In the analyzer we store not separate library summaries, but whole library cycles.

It does sound like you would need a separate cache for macro execution results (which happen to be serializable already).

To simplify things, you could even group up the dependencies of all the macros that run on a single file, into one cached result. So you re-use the entire macro augmentation file or not, in which case the contents of the cache could be the literal resulting augmentation file, which would avoid doing the merging and optimizing/formatting etc also.

The downside being if any macro in the file has to re-run, you have to re-run all of them. But, I think this would still have a huge positive effect overall and quite possibly simplify things further. It might even be faster on the whole, if you can cache the entire resulting merged and optimized augmentation file.

4. We will only try to reuse macro application results when we think it is safe to do.

Yes, more incremental invalidation in general is a task for another time. I just want to get to a state where macros don't severely regress performance for incremental edits to a single file, when lots of libraries with macros depend on that file.

May 20 '24 20:05 jakemac53

To clarify I also want this optimization to work for large library cycles - we should still have a good invalidation story that is more granular than library cycles, ideally. There is a concern that lots of applications are just one big library cycle today.

May 20 '24 20:05 jakemac53

I updated the section above to capture the OmittedTypeAnnotation case as well as the new types being introduced case.

May 20 '24 20:05 jakemac53

Nothing super interesting yet, but some base to build upon. https://dart-review.googlesource.com/c/sdk/+/368160

For now my thinking is that I need to get back from the macro framework the list of Declarations and Identifiers that the macro saw. Maybe as part of MacroExecutionResult. There are some caches outside of the analyzer, and so the analyzer cannot know for sure that if some Declaration was not accessed through API, that this means that the macro actually did not access it. Correct me if I'm wrong about caching, when we start with empty cache, etc.

May 24 '24 18:05 scheglov

Correct me if I'm wrong about caching, when we start with empty cache, etc.

The caching today is always per macro application (per phase also). So it should be safe to do it on your end. Although the macro executor code could also do it, which might help with the eventual CFE implementation, if this is already done.

May 24 '24 20:05 jakemac53

I think the implementation is not so easy as it is described. Or at least there is complexity behind "except for the results that depended on the library which caused the invalidation". Or I don't understand the proposed solution, and think that there is implied simplicity where simplicity is not implied; or the opposite - I see complexity and don't understand that solution is actually simple.

I'm looking from the perspective of loading libraries from the file system cache, after restart, with zero or more files possibly changed with the file system cache was filled. Not live updates of currently loaded libraries after a single file change, this does not fit well the current analyzer architecture.

Let's consider two APIs.

We are running macros in the library L1.

Future<macro.Identifier> resolveIdentifier(Uri library, String name). It either returns an identifier that is exported from library, or throws an exception. Let's say that the library with Uri library is L2.

There are 3 variants for library.

First, the library can be a library outside of the library cycles being linked. In this case it is easy to record dependency - check if the export scope of library has the element named name with the same URI as during the macro application. This probably will cover references to dart:core, package:flutter, etc. When we load cached macro results, we can access LibraryElements of these outside libraries and check that they still export (or not - exception is also a result) the same elements (practically just URI of the declaring library for requested elements).

Second, the L2 is L1 - the library with the macro application. Most often there is no dependency at all. We already use macro results only if the library code is not changed. The identifier is either there already, or is added by other macros by the time we do this request (do we support this use case?), and these macros - we either reuse the whole library macro results or nothing, these results will be the same anyway. Complication: exports (the requested element could be exported from a different library, e.g. we moved the class and export both libraries), but this is probably rare, so we can ignore them initially, and rerun macros if there are exports.

Third, the L2 is a library in the same library cycle that we are linking. Then the logic is mostly the same as in (second). But now we require that we results of running macros in L2 should be reused (so macros produced same results, so the requested element is the same). So, we reuse cached L1 only if we can reuse L2. Complication: exports, again.

Future<macro.TypeDeclaration> typeDeclarationOf(macro.Identifier identifier), more complicated API.

How do we get the identifier? There are several possibilities, but for now let's consider that is a superclass, in c1.dart.

import 'a.dart';
import 'c2.dart';
import 'c3.dart'
@Macro1()
class X extends A {}

If A is already declared in c1.dart, there is again no dependency.

Otherwise, we will use A exported from a.dart (external library), c2.dart, or c1.dart. We know the exact Element at the time of the macro execution.

Here, the fact that a.dart is an external library does not help us, when we are checking if macro results can be reused. Yes, we can know which A it exports (or not). But this does not mean that we will use this A, because another A could be exported from c2.dart. Yes, this is an error, I read this. But I would not like to ignore this, a change is a change, would be better to be precise. So, we need both c2.dart and c3.dart export (or not) the same A. And even if they don't explicitly declare A, any macro could generate one. So, we can reuse macro results of c1.dart only if we can reuse macro results of c2.dart and c3.dart.

And this is not necessary even requires a compile-time error. I can imagine

// c2.dart
import 'y.dart';
@Macro2() // generate A if Y extends Z
class C2 extends Y {}

// c3.dart
import 'y.dart';
@Macro3() // generate A if Y does not extend Z
class C3 extends Y {}

So that changes to Y in y.dart can result in generating A either in c2.dart or in c3.dart.

So, it looks to me that almost always we can reuse either all macro results in the library cycle, or none.

We will still mostly reuse macro results, and a change to a deep dependency will almost never invalidate every transitive consumer. But if we run macros, we will run them for the whole library cycle.

Obviously, let me know if there are holes in my logic.

May 26 '24 00:05 scheglov

But this does not mean that we will use this A, because another A could be exported from c2.dart. Yes, this is an error, I read this. But I would not like to ignore this, a change is a change, would be better to be precise. So, we need both c2.dart and c3.dart export (or not) the same A. And even if they don't explicitly declare A, any macro could generate one. So, we can reuse macro results of c1.dart only if we can reuse macro results of c2.dart and c3.dart.

Importantly, this does not add a dependency on all the files in the library cycle, only the immediate imports.

Yes, it would add more dependencies, but as long as we are considering the final library API from those dependencies (after macro execution) we should usually still get incremental invalidation even within library cycles, unless one of the immediate dependencies macro outputs was affected by the change.

This does somewhat break down if using "convenience libraries" which export tons of other libraries, etc. But it should scale based on the immediate imports + their exports still.

So that changes to Y in y.dart can result in generating A either in c2.dart or in c3.dart.

The dependency on the library defining Y only needs to exist if the macro actually looks at the class declaration for Y, or gets a StaticType for Y.

We will still mostly reuse macro results, and a change to a deep dependency will almost never invalidate every transitive consumer. But if we run macros, we will run them for the whole library cycle.

What specifically makes the library cycles different here? I am not really understanding what concretely makes them different from other import structures. In other words, lets say A depends on B which depends on C. If we also add a dependency from C to A, that shouldn't affect the invalidation semantics when C is changed.

May 28 '24 16:05 jakemac53

But this does not mean that we will use this A, because another A could be exported from c2.dart. Yes, this is an error, I read this. But I would not like to ignore this, a change is a change, would be better to be precise. So, we need both c2.dart and c3.dart export (or not) the same A. And even if they don't explicitly declare A, any macro could generate one. So, we can reuse macro results of c1.dart only if we can reuse macro results of c2.dart and c3.dart.

Importantly, this does not add a dependency on all the files in the library cycle, only the immediate imports.

This is correct, strictly speaking we depend only of a.dart, c2.dart, and c3.dart, not every library in the cycle.

The implementation would be simpler if we redo macro results for the whole library cycle.

But I guess the main question here is whether we can afford it, or think that there are large library cycles with many libraries that have macros applied. Although in this case, because we depend on immediate imports, we still with high probability will redo all macros, because there is a high probability that one of the immediate imports does use macros. So, if there are a few libraries with macros in the cycle, it does not matter much if we redo them all; and if there are many such libraries, we will redo them all anyway.

Yes, it would add more dependencies, but as long as we are considering the final library API from those dependencies (after macro execution) we should usually still get incremental invalidation even within library cycles, unless one of the immediate dependencies macro outputs was affected by the change.

I'm a bit confused by "considering the final library API from those dependencies". When dependencies of a library are inside the same library cycle, at the moment when we about to start building the elements, there is no final API yet. We can know top-level declarations from user code, but don't know what macros will generate this time.

This does somewhat break down if using "convenience libraries" which export tons of other libraries, etc. But it should scale based on the immediate imports + their exports still.

Yes, with library level dependencies exports are additional complication. No such issues with cycle level dependencies.

So that changes to Y in y.dart can result in generating A either in c2.dart or in c3.dart.

The dependency on the library defining Y only needs to exist if the macro actually looks at the class declaration for Y, or gets a StaticType for Y.

Yes, I supposed that macros in c2.dart and c3.dart do look at the declaration of Y, and at the identifier of its superclass.

But my point was that external changes might cause changes to macros, and so c1.dart depends on c2.dart and c3.dart.

We will still mostly reuse macro results, and a change to a deep dependency will almost never invalidate every transitive consumer. But if we run macros, we will run them for the whole library cycle.

What specifically makes the library cycles different here? I am not really understanding what concretely makes them different from other import structures. In other words, lets say A depends on B which depends on C. If we also add a dependency from C to A, that shouldn't affect the invalidation semantics when C is changed.

Well, in this example, it is true that if A applies macros it is true that it depends only on B. But if C does not have macros, it does not matter if we redo macros for the whole cycle, C has nothing to do. And if C does have macros, then we will redo them if we redo them for A, the same as doing them for the whole cycle.

We can do per-library dependencies, I just wondered if this will make significant difference in the performance, while adding complication in the implementation.

May 28 '24 17:05 scheglov

There is an assumption that large library cycles do exist in practice, and that we need to support them well. This is not an assumption based on evidence as far as I know, but more just based on an educated guess. It is at least plausible to assume these large library cycles do exist, given the fact that we do not tell users to split up their applications into several packages, and so it would be very easy to create large library cycles in apps without a lot of discipline in sub-package library structure.

I'm a bit confused by "considering the final library API from those dependencies". When dependencies of a library are inside the same library cycle, at the moment when we about to start building the elements, there is no final API yet. We can know top-level declarations from user code, but don't know what macros will generate this time

Ah, this is the piece that I was missing. I think we actually require for library cycles, that you run each phase across all libraries in the cycle, before continuing to the next phase in any of them. They are treated essentially as if they are one large library, from this perspective.

Because of that, you cannot run all the macros on just one library in the cycle (which was invalidated), in order to see if other things should be invalidated based on that result changing.

Is that correct?

May 28 '24 18:05 jakemac53

There is an assumption that large library cycles do exist in practice, and that we need to support them well. This is not an assumption based on evidence as far as I know, but more just based on an educated guess. It is at least plausible to assume these large library cycles do exist, given the fact that we do not tell users to split up their applications into several packages, and so it would be very easy to create large library cycles in apps without a lot of discipline in sub-package library structure.

Yes, I can see that it might happen. I also have not seen data that supports or disproves the existence of large library cycles.

I'm a bit confused by "considering the final library API from those dependencies". When dependencies of a library are inside the same library cycle, at the moment when we about to start building the elements, there is no final API yet. We can know top-level declarations from user code, but don't know what macros will generate this time

Ah, this is the piece that I was missing. I think we actually require for library cycles, that you run each phase across all libraries in the cycle, before continuing to the next phase in any of them. They are treated essentially as if they are one large library, from this perspective.

Correct, we run each phase across all libraries in the cycle, before continuing to the next phase in any of them.

Because of that, you cannot run all the macros on just one library in the cycle (which was invalidated), in order to see if other things should be invalidated based on that result changing.

Is that correct?

Yes.

May 28 '24 18:05 scheglov

Given that, it does sound like a more complicated problem than I anticipated.

@davidmorgan has some related ideas to share at the sync up today, so we can try to evaluate whether it solves this problem any better or not.

May 28 '24 19:05 jakemac53

Thanks Konstantin, thanks Jake.

Per our chat yesterday: I think we should reframe this discussion in terms of data, which is simpler than dependencies. So the question is: what data can the macro host provide; what data does the macro need; how does it change as we run through the phases; when and how can input to one macro change due to output of another macro.

Right now I don't understand enough about the execution model inside the analyzer or the CFE @johnniwinther so I am surely saying things that don't make sense :) ... one thing I'd like to get to is an "implementation doc" that describes what a macro host does in sufficient detail that it describes what is happening both in the analyzer and the CFE including any performance-relevant differences. For example if the analyzer and CFE necessarily take a different approach with different performance characteristics then that might need considering for the design as a whole.

I realize that I have been unhelpfully working in docs instead of on GitHub, let me try to pivot to using GitHub more :)

Suggestion, maybe we start a series of "implementation docs" living next to the feature spec and expanding on what is specified with implementation detail/design, would that be a good way to make progress?

https://github.com/dart-lang/language/blob/main/working/macros/feature-specification.md

I can go ahead and create some wrong outlines to be filled in with the actual details if that's a good way to start ;)

Or, any other approach someone would like to suggest.

Thanks.

May 29 '24 08:05 davidmorgan

I shared a small document with you.

Discussing dependencies is the same as discussing data. You get a dependency because you asked for a piece of data. Anything that could affect the provided answer, e.g. ClassDeclaration, or the generated resulting code, is now your dependency. The protocol, and how much to answer at once does not matter IMHO. This is just optimization.

So, we still need to decide what to do with A if it could be macro generated either in c2.dart or c3.dart.

May 29 '24 16:05 scheglov

Thanks @scheglov, much appreciated :) I sent a PR adding your notes next to the macros spec https://github.com/dart-lang/language/pull/3852 so we can refer to it + evolve into a more general doc including CFE. @johnniwinther a description like this for what the CFE does would be great please, so we can start to understand the differences :)

Re: data or deps, you are right that the difference does not matter for correctness. However the optimization from thinking about data is likely to be very large :) ... it is in the nature of macros that they consume only a small fraction of the information from their dependencies, so almost all reruns triggered by changed dependencies are unnecessary.

As an extreme example, most macros only consume a small fraction of the library they are applied in: probably a single class, when the library might have tens or hundreds of classes. So the % of same-library changes that actually need a rerun is already probably <10% even before looking at imports.

Correctness questions are of course also important :)

I'm going to keep notes on investigation around data-oriented macros on language #3706, I have started to check in exploratory code there that we can use for experiments around performance and correctness.

May 30 '24 11:05 davidmorgan

https://github.com/dart-lang/language/tree/main/working/macros/dart_model now has a runnable example comparing the current macro implementation vs data model approach on a huge library cycle.

The trivial macros in the example only want the list of field names, so the data model approach can easily know that the list of field names did not change, and only rerun for one file.

There is a pretty big gap in complexity between this and something like the JSON macro, I will see next how far I can push it and what useful results I can get along the way :)

May 30 '24 17:05 davidmorgan

it is in the nature of macros that they consume only a small fraction of the information from their dependencies

You maybe think about dependencies as whole libraries or imports. I try to think about it as information that you saw and might have used. For the same library, if all you asked is nothing, but were implicitly given the target class declaration, then your dependency is that class and its properties. Not even methods declared in there, just the header.

For the purpose of fine grained invalidation the problem is not the dependencies, but outputs of macros. As above, from where A came from, which macro could have generated it? Currently I think there is no way to know, macros don't declare what is the output.

May 30 '24 17:05 scheglov

I have put some more thought into what I briefly mentioned in the macro implementation sync up. In order to work around the library cycle issue, it seems to me like we need to cache the macro outputs from each phase.

In the situation mentioned above, with deps like A -> B -> C -> A, assuming macro applications in all 3 libraries, and an API change to C, we could do essentially the following:

Immediately invalidate B and all its macros (C could be shadowing things from B now).
Run all phase 1 macro applications on B.
Compare the result with previous phase 1 result (ideally, this would be a hash of the API signature of the declarations only, but it could be a hash of the entire augmentation file).
If the result is different, invalidate A and all its macros (identifiers could have been shadowed), and run phase 1 macros on A (this could in turn invalidate other libraries if the result changes from previous result).
Run all phase 2 macros applications on B (and possibly A if it was previously invalidated).
Compare result with previous phase 2 result.
If the result is different, invalidate macro applications from phase 2 and 3 in A (it is also OK to re-run phase 1, but shouldn't be required I don't believe). Run phase 2 macros in A (which could again invalidate other libraries).
Run phase 3 macros across all invalidated libraries. This shouldn't be able to cause further invalidation.

@scheglov does that make sense? Does it sound feasible?

May 30 '24 20:05 jakemac53

We used to run the declarations phase for all libraries of the cycle at once, IIRC because we can introspect other types. Is it safe now run the declarations phase on individual libraries?

During the types phase, I think we cannot reference any local identifier, right?

/*macro*/ class MyMacro implements ClassTypesMacro {
  const MyMacro();

  @override
  FutureOr<void> buildTypesForClass(declaration, ClassTypeBuilder builder) {
    builder.declareType(
      'X',
      DeclarationCode.fromParts([
        'class X extends ',
        declaration.superclass!.code,
        ' {}',
      ]),
    );
  }
}

@MyMacro()
class B extends A {}

This macro throws an exception

  Invalid argument(s): Unresolved identifier: A
  #0      DeclarationBuilder.resolveIdentifier (package:analyzer/src/summary2/macro_declarations.dart:308:9)
  #1      _Builder._buildIdentifier (package:_macros/src/executor/augmentation_library.dart:73:53)
  #2      _Builder._buildCode (package:_macros/src/executor/augmentation_library.dart:133:9)
  #3      _Builder._buildCode (package:_macros/src/executor/augmentation_library.dart:131:9)
  #4      _Builder.build (package:_macros/src/executor/augmentation_library.dart:156:9)
  #5      AugmentationLibraryBuilder.buildAugmentationLibrary (package:_macros/src/executor/augmentation_library.dart:23:10)

We cannot resolve A because we are still generating types, and cannot possibly know how it will be resolved. See the document that I shared - types are resolved after the types phase.

We could do theoretically

    var c = declaration.library.uri.resolve('c.dart');
    builder.declareType(
      'X',
      DeclarationCode.fromParts([
        'class X extends ',
        await builder.resolveIdentifier(c, 'C'),
        ' {}',
      ]),
    );

But I think it should not be allowed to resolve identifiers from the same library cycles, because there is no defined order in which the types phase macro are applied across libraries. Maybe C, then B; but maybe B and then C. And then we will not able resolve package:test/c.dart@C. Right?

So, it seems to me that the types phase in B cannot depend on the types phase in C.

May 30 '24 21:05 scheglov

For questions of correctness related to ordering, I think we should consider moving to a model with no ordering #3858 ... we can now compare using test cases, hopefully this can be a way to push to a convincing decision one way or the other.

Regarding performance, I am now 90% convinced that evolving the current execution model into an optimal or near-optimal one is not possible. Fine-grained requests and tracking dependencies dynamically based on them cannot be as fast in a serialization-based execution model as broad requests and data structures that can be diffed / recombined.

So I think it would be good to refocus effort on poking holes in the proposed new approach: if there are reasons it won't work, let's find them as quickly as possible; if it will work, let's switch to it. I am ready to add functionality/experiments/benchmarks as needed :) and welcome suggestions.

What convinced me is the benchmark in my exploratory code, documented in the README, in which I apply three trivial macros to a 67k LOC library cycle with 64 files and 640 macro applications.

In my exploratory code, the "macros" introspect by making three RPCs and receiving three responses. Not three responses per macro: three responses for the whole codebase. The data is then chopped up and passed to a generate method that is as nice to write as any other generate method

  String generateFor(Interface clazz) {
    final result = StringBuffer();

    result.writeln('augment class ${clazz.name} {');
    result.writeln('  bool operator==(Object other) =>');
    result.writeln('      other is ${clazz.name}');
    for (final field in clazz.members.values.where((m) => m.isField)) {
      result.writeln('    && other.${field.name} == this.${field.name}');
    }
    result.writeln(';}');

    return result.toString();
  }

then the macros make another RPC each with the results, for a total of nine messages; this is a big example, so they are big results, about 3.5Mb of data:

dart bin/main.dart /tmp/dart_model_benchmark/dartModel/package_under_test
~~~ setup
Launching analyzer on: /tmp/dart_model_benchmark/dartModel/package_under_test
Listening on localhost:26199.
~~~ hosting
Incoming connection.
  EqualsMacro augments 64 uri(s), 1891591 char(s).
  HashCodeMacro augments 64 uri(s), 980681 char(s).
  ToStringMacro augments 64 uri(s), 726593 char(s)

so the current implementation has # of messages on startup: 640 * 3 * 2 * (# introspection calls) + 640 * 3 for the results, many thousands, when it's possible to cut this down to nine.

Then when you go and change one file, the current implementation has to rerun everything because it's a library cycle, while the exploratory implementation just reruns one application; six messages sent, three updates to macro input and three (much smaller this time) updates to macro output.

  EqualsMacro augments 1 uri(s), 29976 char(s).
  HashCodeMacro augments 1 uri(s), 15539 char(s).
  ToStringMacro augments 1 uri(s), 11560 char(s).
Write: /tmp/dart_model_benchmark/dartModel/package_under_test/lib/a0.a.dart

So again it's thousands of requests vs just a few, and the analyzer only has to do work on the output a0.a.dart instead of 64 sets of augmentations.

A complex macro could need hundreds of introspection calls in realistic cases, so there will be large codebases where the current implementation will need the best part of a million(!) messages passed, redescribing most of the package from scratch, and the exploratory implementation still needs six, describing only what changed.

As I said, this feels pretty conclusive to me--looking forward to hearing what others think. Thanks!

May 31 '24 09:05 davidmorgan

I still don't actually fully understand what exactly is being proposed with the data model approach enough to be able to really provide good feedback at this time.

I think we need some much more specific details about what is actually being proposed to understand what the potential pitfalls might be?

May 31 '24 15:05 jakemac53

@jakemac53 That's fair :) I'm short on time before the weekend starts here, let me summarize quickly and I can do further writeup as needed on Monday.

Summary added to the query-like API issue since it belongs better there :) feel free to throw any questions/observations there, too. Thanks!

May 31 '24 15:05 davidmorgan

@scheglov This identifier resolution is only happening in the augmentation library creation phase right?

It seems likely that we will start implicitly including the "parent" library import scope into augmentation files (with the move to the "part" file unification). This might allow us to not do these resolveIdentifier calls at all, or simply provide a fallback where if it can't be resolved, we just use the raw name + original prefix with no new import. The "original prefix" might be a bit tricky, not sure.

We could even more generally rely on this kind of resolution for identifiers which came from the original library.

Would that help here? I think the specific issue is the import might change in the augmentation?

May 31 '24 18:05 jakemac53

Ah, interesting.

import 'c2.dart';
import 'c3.dart'
@Macro1()
class X extends A {}

Well, when we talk about reusing macro generated augmentations, we want to make sure that the reused code is the same as we would create if we re-run macros in the library. It can mean something different (e.g. when A moves from c2.dart to c3.dart). Remember, that we relink element models, so the meaning does not matter. When we reuse the macro generated code, it goes through the all the phases of resolution - build elements, resolve references, top-level types, etc. So, if the code that we reuse does not have specific import like import 'c2.dart' as prefix1, so that we generate prefix1.A somewhere, then we can reuse this code. These are kind of "as-is" identifiers.

Yeah, we can still have a conflict, here method foo shadows the import prefix.

import 'dart:math' as foo;

class A {
  final pi = foo.pi;
  
  void foo() {}
}

So, this probably will solve the issue with types. Time to think about the declarations phase, and introspections there :-)

May 31 '24 20:05 scheglov

/tmp/dart_model_benchmark/json-manual/package_under_test
[time: 819 ms]
(name: <scheduler>, count: 0, elapsed: 0:00:00.000000, elapsedSelf: -0:00:00.765840)
  (name: getUnitElement, count: 20, elapsed: 0:00:00.765840, elapsedSelf: 0:00:00.291550)(bytesPut: 1792746, cycleCount: 2, libraryCount: 27)
    (name: link, count: 2, elapsed: 0:00:00.474290, elapsedSelf: 0:00:00.377295)
      (name: computeLibraryScopes, count: 2, elapsed: 0:00:00.096821, elapsedSelf: 0:00:00.073057)
        (name: buildMacroApplier, count: 2, elapsed: 0:00:00.023740, elapsedSelf: 0:00:00.023740)
        (name: executeMacroTypesPhase, count: 2, elapsed: 0:00:00.000024, elapsedSelf: 0:00:00.000024)
      (name: executeMacroDeclarationsPhase, count: 2, elapsed: 0:00:00.000065, elapsedSelf: 0:00:00.000065)
      (name: executeMacroDefinitionsPhase, count: 2, elapsed: 0:00:00.000066, elapsedSelf: 0:00:00.000066)
      (name: mergeMacroAugmentations, count: 2, elapsed: 0:00:00.000043, elapsedSelf: 0:00:00.000029)
        (name: buildAugmentationLibraryCode, count: 27, elapsed: 0:00:00.000014, elapsedSelf: 0:00:00.000014)

/tmp/dart_model_benchmark/json-macro/package_under_test
[time: 2659 ms]
(name: <scheduler>, count: 0, elapsed: 0:00:00.000000, elapsedSelf: -0:00:02.610777)
  (name: getUnitElement, count: 21, elapsed: 0:00:02.610777, elapsedSelf: 0:00:00.003318)
    (name: libraryContext, count: 21, elapsed: 0:00:02.607459, elapsedSelf: 0:00:00.126986)(bytesPut: 7485370, cycleCount: 5, libraryCount: 30)
      (name: link, count: 5, elapsed: 0:00:02.290993, elapsedSelf: 0:00:00.269205)
        (name: computeLibraryScopes, count: 5, elapsed: 0:00:00.158152, elapsedSelf: 0:00:00.058843)
          (name: buildMacroApplier, count: 5, elapsed: 0:00:00.099226, elapsedSelf: 0:00:00.099226)
          (name: executeMacroTypesPhase, count: 5, elapsed: 0:00:00.000083, elapsedSelf: 0:00:00.000083)
        (name: executeMacroDeclarationsPhase, count: 5, elapsed: 0:00:00.231387, elapsedSelf: 0:00:00.000777)
          (name: executeDeclarationsPhase, count: 221, elapsed: 0:00:00.095637, elapsedSelf: 0:00:00.095637)(constructorsOf: 200, methodsOf: 200)
          (name: addMacroResults, count: 200, elapsed: 0:00:00.134973, elapsedSelf: 0:00:00.002016)
            (name: buildAugmentationLibraryCode, count: 200, elapsed: 0:00:00.004949, elapsedSelf: 0:00:00.004949)
            (name: kind.addMacroAugmentation, count: 200, elapsed: 0:00:00.045280, elapsedSelf: 0:00:00.006240)(length: 49900)
              (name: getFileForUri, count: 200, elapsed: 0:00:00.039040, elapsedSelf: 0:00:00.012479)
                (name: [newFile][refresh], count: 200, elapsed: 0:00:00.026561, elapsedSelf: 0:00:00.001563)
                  (name: getUnlinkedUnit, count: 200, elapsed: 0:00:00.024998, elapsedSelf: 0:00:00.015811)
                    (name: parse, count: 200, elapsed: 0:00:00.009187, elapsedSelf: 0:00:00.000102)
                      (name: parseCode, count: 200, elapsed: 0:00:00.009085, elapsedSelf: 0:00:00.009085)(length: 49900)
            (name: _addMacroAugmentation, count: 200, elapsed: 0:00:00.006208, elapsedSelf: 0:00:00.001918)
              (name: parse, count: 200, elapsed: 0:00:00.004290, elapsedSelf: 0:00:00.000073)
                (name: parseCode, count: 200, elapsed: 0:00:00.004217, elapsedSelf: 0:00:00.004217)(length: 49900)
            (name: elements + types, count: 200, elapsed: 0:00:00.076520, elapsedSelf: 0:00:00.076520)
        (name: executeMacroDefinitionsPhase, count: 5, elapsed: 0:00:01.006704, elapsedSelf: 0:00:00.000503)
          (name: executeDefinitionsPhase, count: 221, elapsed: 0:00:00.352882, elapsedSelf: 0:00:00.352882)(constructorsOf: 200, methodsOf: 200, resolve: 600, typeDeclarationOf: 200)
          (name: addMacroResults, count: 200, elapsed: 0:00:00.653319, elapsedSelf: 0:00:00.003306)
            (name: buildAugmentationLibraryCode, count: 200, elapsed: 0:00:00.067662, elapsedSelf: 0:00:00.067662)
            (name: kind.addMacroAugmentation, count: 200, elapsed: 0:00:00.308338, elapsedSelf: 0:00:00.026328)(length: 2314900)
              (name: getFileForUri, count: 200, elapsed: 0:00:00.282010, elapsedSelf: 0:00:00.018974)
                (name: [newFile][refresh], count: 200, elapsed: 0:00:00.263036, elapsedSelf: 0:00:00.002433)
                  (name: getUnlinkedUnit, count: 200, elapsed: 0:00:00.260603, elapsedSelf: 0:00:00.091925)
                    (name: parse, count: 200, elapsed: 0:00:00.168678, elapsedSelf: 0:00:00.000132)
                      (name: parseCode, count: 200, elapsed: 0:00:00.168546, elapsedSelf: 0:00:00.168546)(length: 2314900)
            (name: _addMacroAugmentation, count: 200, elapsed: 0:00:00.157929, elapsedSelf: 0:00:00.002751)
              (name: parse, count: 200, elapsed: 0:00:00.155178, elapsedSelf: 0:00:00.000146)
                (name: parseCode, count: 200, elapsed: 0:00:00.155032, elapsedSelf: 0:00:00.155032)(length: 2314900)
            (name: elements + types, count: 200, elapsed: 0:00:00.116084, elapsedSelf: 0:00:00.116084)
        (name: mergeMacroAugmentations, count: 5, elapsed: 0:00:00.625545, elapsedSelf: 0:00:00.018498)
          (name: buildAugmentationLibraryCode, count: 30, elapsed: 0:00:00.068067, elapsedSelf: 0:00:00.068067)
          (name: parseCode, count: 20, elapsed: 0:00:00.149372, elapsedSelf: 0:00:00.000103)(length: 2327350)
            (name: parseCode, count: 20, elapsed: 0:00:00.149269, elapsedSelf: 0:00:00.149269)(length: 2327350)
          (name: kind.addMacroAugmentation, count: 20, elapsed: 0:00:00.247920, elapsedSelf: 0:00:00.021420)
            (name: getFileForUri, count: 20, elapsed: 0:00:00.226500, elapsedSelf: 0:00:00.003221)
              (name: [newFile][refresh], count: 20, elapsed: 0:00:00.223279, elapsedSelf: 0:00:00.000398)
                (name: getUnlinkedUnit, count: 20, elapsed: 0:00:00.222881, elapsedSelf: 0:00:00.065864)
                  (name: parse, count: 20, elapsed: 0:00:00.157017, elapsedSelf: 0:00:00.000028)
                    (name: parseCode, count: 20, elapsed: 0:00:00.156989, elapsedSelf: 0:00:00.156989)(length: 2327350)
          (name: importedFile.parse(), count: 20, elapsed: 0:00:00.141688, elapsedSelf: 0:00:00.000055)(length: 2327350)
            (name: parseCode, count: 20, elapsed: 0:00:00.141633, elapsedSelf: 0:00:00.141633)(length: 2327350)
      (name: macroCompileKernel, count: 1, elapsed: 0:00:00.189480, elapsedSelf: 0:00:00.189480)

From this statistics, if we don't have to re-run macros, and reuse previously generated code, we can subtract:

(name: executeDeclarationsPhase, count: 221, elapsed: 0:00:00.095637, elapsedSelf: 0:00:00.095637)
(name: buildAugmentationLibraryCode, count: 200, elapsed: 0:00:00.004949, elapsedSelf: 0:00:00.004949)

(name: executeDefinitionsPhase, count: 221, elapsed: 0:00:00.352882, elapsedSelf: 0:00:00.352882)
(name: buildAugmentationLibraryCode, count: 200, elapsed: 0:00:00.067662, elapsedSelf: 0:00:00.067662)

(name: buildAugmentationLibraryCode, count: 30, elapsed: 0:00:00.068067, elapsedSelf: 0:00:00.068067)

Which is 0.589197 is total. So, if we don't run any macros, we can get down to 2659 - 589 = 2070 ms. Still not 819 ms as for manual code.

It looks that there are other expensive actions in the analyzer: parsing. We parse each partial augmentation twice: first when we add FileState - to extract unlinked data, and the second time when we build elements. Theoretically we can avoid this.

(name: parseCode, count: 200, elapsed: 0:00:00.004217, elapsedSelf: 0:00:00.004217)(length: 49900)
(name: parseCode, count: 200, elapsed: 0:00:00.155032, elapsedSelf: 0:00:00.155032)(length: 2314900)
(name: parseCode, count: 20, elapsed: 0:00:00.149372, elapsedSelf: 0:00:00.000103)(length: 2327350)
(name: parseCode, count: 20, elapsed: 0:00:00.141633, elapsedSelf: 0:00:00.141633)(length: 2327350)

Which would save 0.450254 s = 450 ms. This would give 2070 - 450 = 1620 ms. Still not 819 ms as for manual code.

We can also subtract (name: macroCompileKernel, count: 1, elapsed: 0:00:00.189480, this gives us 1620 - 189 = 1431 ms.

At this point it seems unavoidable that the version with macros will always slower. We have to parse every macro generated code de-facto twice. First time, after running a single macro application. Second time, after merging macro results.

Jun 17 '24 22:06 scheglov

Thanks Konstantin, very useful to see timing detail :)

The concern about having to parse twice is the sort of thing I was thinking about when I filed quantify the cost of merging into analyzer and CFE data models, really this is core to the whole feature.

I'm a bit lost with all the zeros, here is the same output with times <1s in ms :)

[time: 819 ms]
(name: <scheduler>, count: 0, elapsed: 0ms, elapsedSelf: -766ms)
  (name: getUnitElement, count: 20, elapsed: 766ms, elapsedSelf: 292ms)(bytesPut: 1792746, cycleCount: 2, libraryCount: 27)
    (name: link, count: 2, elapsed: 474ms, elapsedSelf: 377ms)
      (name: computeLibraryScopes, count: 2, elapsed: 97ms, elapsedSelf: 73ms)
        (name: buildMacroApplier, count: 2, elapsed: 24ms, elapsedSelf: 24ms)
        (name: executeMacroTypesPhase, count: 2, elapsed: 0ms, elapsedSelf: 0ms)
      (name: executeMacroDeclarationsPhase, count: 2, elapsed: 0ms, elapsedSelf: 0ms)
      (name: executeMacroDefinitionsPhase, count: 2, elapsed: 0ms, elapsedSelf: 0ms)
      (name: mergeMacroAugmentations, count: 2, elapsed: 0ms, elapsedSelf: 0ms)
        (name: buildAugmentationLibraryCode, count: 27, elapsed: 0ms, elapsedSelf: 0ms)

/tmp/dart_model_benchmark/json-macro/package_under_test
[time: 2659 ms]
(name: <scheduler>, count: 0, elapsed: 0ms, elapsedSelf: -0:00:02.610777)
  (name: getUnitElement, count: 21, elapsed: 0:00:02.610777, elapsedSelf: 3ms)
    (name: libraryContext, count: 21, elapsed: 0:00:02.607459, elapsedSelf: 127ms)(bytesPut: 7485370, cycleCount: 5, libraryCount: 30)
      (name: link, count: 5, elapsed: 0:00:02.290993, elapsedSelf: 269ms)
        (name: computeLibraryScopes, count: 5, elapsed: 158ms, elapsedSelf: 59ms)
          (name: buildMacroApplier, count: 5, elapsed: 99ms, elapsedSelf: 99ms)
          (name: executeMacroTypesPhase, count: 5, elapsed: 0ms, elapsedSelf: 0ms)
        (name: executeMacroDeclarationsPhase, count: 5, elapsed: 231ms, elapsedSelf: 1ms)
          (name: executeDeclarationsPhase, count: 221, elapsed: 96ms, elapsedSelf: 96ms)(constructorsOf: 200, methodsOf: 200)
          (name: addMacroResults, count: 200, elapsed: 135ms, elapsedSelf: 2ms)
            (name: buildAugmentationLibraryCode, count: 200, elapsed: 5ms, elapsedSelf: 5ms)
            (name: kind.addMacroAugmentation, count: 200, elapsed: 45ms, elapsedSelf: 6ms)(length: 49900)
              (name: getFileForUri, count: 200, elapsed: 39ms, elapsedSelf: 12ms)
                (name: [newFile][refresh], count: 200, elapsed: 27ms, elapsedSelf: 2ms)
                  (name: getUnlinkedUnit, count: 200, elapsed: 25ms, elapsedSelf: 16ms)
                    (name: parse, count: 200, elapsed: 9ms, elapsedSelf: 0ms)
                      (name: parseCode, count: 200, elapsed: 9ms, elapsedSelf: 9ms)(length: 49900)
            (name: _addMacroAugmentation, count: 200, elapsed: 6ms, elapsedSelf: 2ms)
              (name: parse, count: 200, elapsed: 4ms, elapsedSelf: 0ms)
                (name: parseCode, count: 200, elapsed: 4ms, elapsedSelf: 4ms)(length: 49900)
            (name: elements + types, count: 200, elapsed: 77ms, elapsedSelf: 77ms)
        (name: executeMacroDefinitionsPhase, count: 5, elapsed: 0:00:01.006704, elapsedSelf: 1ms)
          (name: executeDefinitionsPhase, count: 221, elapsed: 353ms, elapsedSelf: 353ms)(constructorsOf: 200, methodsOf: 200, resolve: 600, typeDeclarationOf: 200)
          (name: addMacroResults, count: 200, elapsed: 653ms, elapsedSelf: 3ms)
            (name: buildAugmentationLibraryCode, count: 200, elapsed: 68ms, elapsedSelf: 68ms)
            (name: kind.addMacroAugmentation, count: 200, elapsed: 308ms, elapsedSelf: 26ms)(length: 2314900)
              (name: getFileForUri, count: 200, elapsed: 282ms, elapsedSelf: 19ms)
                (name: [newFile][refresh], count: 200, elapsed: 263ms, elapsedSelf: 2ms)
                  (name: getUnlinkedUnit, count: 200, elapsed: 261ms, elapsedSelf: 92ms)
                    (name: parse, count: 200, elapsed: 169ms, elapsedSelf: 0ms)
                      (name: parseCode, count: 200, elapsed: 169ms, elapsedSelf: 169ms)(length: 2314900)
            (name: _addMacroAugmentation, count: 200, elapsed: 158ms, elapsedSelf: 3ms)
              (name: parse, count: 200, elapsed: 155ms, elapsedSelf: 0ms)
                (name: parseCode, count: 200, elapsed: 155ms, elapsedSelf: 155ms)(length: 2314900)
            (name: elements + types, count: 200, elapsed: 116ms, elapsedSelf: 116ms)
        (name: mergeMacroAugmentations, count: 5, elapsed: 626ms, elapsedSelf: 18ms)
          (name: buildAugmentationLibraryCode, count: 30, elapsed: 68ms, elapsedSelf: 68ms)
          (name: parseCode, count: 20, elapsed: 149ms, elapsedSelf: 0ms)(length: 2327350)
            (name: parseCode, count: 20, elapsed: 149ms, elapsedSelf: 149ms)(length: 2327350)
          (name: kind.addMacroAugmentation, count: 20, elapsed: 248ms, elapsedSelf: 21ms)
            (name: getFileForUri, count: 20, elapsed: 227ms, elapsedSelf: 3ms)
              (name: [newFile][refresh], count: 20, elapsed: 223ms, elapsedSelf: 0ms)
                (name: getUnlinkedUnit, count: 20, elapsed: 223ms, elapsedSelf: 66ms)
                  (name: parse, count: 20, elapsed: 157ms, elapsedSelf: 0ms)
                    (name: parseCode, count: 20, elapsed: 157ms, elapsedSelf: 157ms)(length: 2327350)
          (name: importedFile.parse(), count: 20, elapsed: 142ms, elapsedSelf: 0ms)(length: 2327350)
            (name: parseCode, count: 20, elapsed: 142ms, elapsedSelf: 142ms)(length: 2327350)
      (name: macroCompileKernel, count: 1, elapsed: 189ms, elapsedSelf: 189ms)

Re: needing to parse twice, would/could that cost go away if macro augmentations stayed in separate "files" instead of merging?

Thanks.

Jun 18 '24 07:06 davidmorgan

It looks that https://github.com/dart-lang/language/issues/3868 talks about different meaning of the word "merge": update the element model after running each single macro application, using a single set of macro results. My message above is talking about merging all macro results after all macro application into a single library augmentation, and updating corresponding FileStates.

Yes, if we keep the generated code of each macro application in separate file, as it happens transiently today, we can avoid generating the merged code and parsing it. So, we will not have to parse all macro generated code twice.

But we want to merge into a single library augmentation, because otherwise UX in IDE will be bad.

As for timings, we can look at (name: mergeMacroAugmentations, count: 5, elapsed: 0:00:00.625545, elapsedSelf: 0:00:00.018498), but for portion of it was already discounted when we arrived to 1431 ms vs. 819 ms (manual code). I think we already discounted about 350 ms which gives us about 275 ms to save, so 1156 ms vs. 819 ms.

Counting it slightly differently I get to 2659 - 96 - 4 - 353 - 68 - 155 - 626 - 189 = 1168 ms vs. 819 ms, so about the same.

I guess the remaining 350 ms difference of macro vs. manual is because the macro code is somewhat bigger (2548 KB vs. 1862 KB), has more elements (external + definitions), and some other overhead related to running macros.

Jun 18 '24 16:06 scheglov

Yes, that issue uses "merge" in a more general sense, merging macro results into a single library is part of the whole "merge macro output into the host model".

UX is of course important ... I haven't really thought about the tradeoffs of merging vs not merging, is the main downside of not merging that you can't easily view all the augmentations, you'd have to click through from all the points that get augmented? ...

Thanks.

Jun 19 '24 13:06 davidmorgan

A library is usually one file - you can open it and scroll, look on all methods at once, quickly navigate back and forth.

Macro generated code should be just like this - single file, so that you can glance at everything, read easily, etc.

Jumping between a hundred of tiny files will overwhelm both the developers and the IDE; the developer will start to forget ho he got there, the IDE will start closing previous editor tabs, etc. Almost impossible to get full picture.

Jun 19 '24 17:06 scheglov

sdk sdk copied to clipboard

Track macro dependencies at the library level for improved invalidation

The Problem

Solution

Implementation Details

Harder cases

StaticType

Constant evaluation

Identifiers in Code objects

OmittedTypeAnnotation

New types being introduced in immediate imports

sdk
sdk copied to clipboard