language Reduce use of prefixes in macro output

Looking at code generated by a test macro in the analyzer.

library augment 'test.dart';

import 'package:test/a.dart' as prefix0;
import 'dart:core' as prefix1;

augment class A {
  @prefix0.AugmentGetter()
  external int get foo;
  augment prefix1.int get foo => 42;
}

If you look carefully, you will notice an asymmetry here - int get foo without an import prefix, and prefix1.int get foo with one. Does this mean that writing just a string literal external int get foo; into declareInType() is wrong? Should absolutely every identifier go through resolveIdentifier()? Do we expect that macro authors will know this? What bad will happen if they don't, and will write, as I did, just a string literal?

Should we validate resulting code and report such violations?

Can we try to avoid generating import prefixes when they are not required? How much code have you seen that required importing dart:core with a prefix to write prefix1.int? I saw it only once, in the analyzer tests, to check that we can :-) To me code with import prefixes looks not so tidy. It is correct, but not nice. I think we should encourage macro authors to write good looking code, with good formatting, etc. And if we will generate import prefixes, so that code does not look good no matter what, the quality will always stay low. I also think that good looking generated code is important for the "wow" effect.

Oh, maybe I could do this on the analyzer side. Get the correct "prefixed" version, and then "optimize" it. If an identifier does not have any conflicts in the scope, remove the prefix. But to do this more efficiently we would need to get from buildAugmentationLibrary() not only the string, but also for every Identifier all locations where it was written.

@jakemac53

Dec 05 '23 05:12 scheglov

Oh, maybe I could do this on the analyzer side.

Would it be a problem for the generated code to be different?

My assumption (so this is more an attempt to gain understanding than to voice an opinion) is that it would lead to a poor UX for users if the code generated by the analyzer deviates from the code generated by the front-end / VM. (Where exactly is the executed code being generated?)

The most obvious example would seem to be when debugging code. If the user steps into generated code in the IDE, and the code shown doesn't match the code being executed, then the execution point marker will be in the wrong location making it hard for users to figure out what's actually being executed.

Dec 05 '23 16:12 bwilkerson

Does this mean that writing just a string literal external int get foo; into declareInType() is wrong?

Yes, all top level identifiers should be emitted using actual Identifier instances, even ones from dart:core. The main issue here is that if you mix and match things (across different macros even), then as soon as a prefixed dart:core import is added that will make the unprefixed identifiers no longer work.

Should we validate resulting code and report such violations?

Possibly, I am not sure if it is worth doing or not. An error should be produced at some point later on if it causes an issue, but it is an easy mistake to make for sure.

Maybe the easiest fix would be to always add a prefixed import of dart:core, even if it is unused. This would cause errors for any unprefixed references to top level symbols from dart:core.

Certain symbols macros cannot get true identifiers for, and have to rely on the raw name - for instance when referencing local variables or parameters.

Can we try to avoid generating import prefixes when they are not required? How much code have you seen that required importing dart:core with a prefix to write prefix1.int?

Eventually, we could try and do that yes. I consider it very low priority, I am much more focused on the general semantics and core functionality. The logic here is shared, and can be safely improved/changed at any time in a non-breaking way.

It is a requirement that whatever we do is every bit as safe as prefixed imports though, the edge cases matter.

Oh, maybe I could do this on the analyzer side.

Would it be a problem for the generated code to be different?

Exactly, it is more important that the code is consistent across tools than that it is pretty. So any such logic should live in the shared code for building augmentation libraries.

Dec 05 '23 16:12 jakemac53

Oh, of course the same transformation should be done also when CFE, so that both analyzer and CFE produce identical code.

I just was thinking how to implement it. While we compose code in the shared code, we don't know yet all the declarations that will be in scope.

void f() {
  [int] int = 0;
}

When we write the identifier at [int] we don't know that there is int local variable below. So, we need to build the whole code, then parse it, build scopes, and check every identifier reference.

So, we would need to pass this code to analyzer or CFE to parse and check scopes. So, I thought that maybe just do the whole "optimization" in the analyzer or CFE. OTOH, there are chances of differences in implementation. I guess running on shared set of tests that verify generated code, will help.

I consider it very low priority, I am much more focused on the general semantics and core functionality. The logic here is shared, and can be safely improved/changed at any time in a non-breaking way.

I think it might be more preferable to show something looking good, even if with less features implemented. Complaining about code style if much easier than about missing features.

Dec 05 '23 17:12 scheglov

So, we need to build the whole code, then parse it, build scopes, and check every identifier reference.

This sounds expensive to me? Wouldn't we also likely need to emit a whole new library after this and re-parse it in order to get the correct offsets etc?

Dec 05 '23 18:12 jakemac53

So, we need to build the whole code, then parse it, build scopes, and check every identifier reference.

This sounds expensive to me? Wouldn't we also likely need to emit a whole new library after this and re-parse it in order to get the correct offsets etc?

Yes, we would need to re-generate the library augmentation, and then re-parse it. I think nice looking code is worth to pay a bit for it.

Parsing is fairly cheap.

Benchmark here

import 'package:analyzer/dart/analysis/analysis_context_collection.dart';
import 'package:analyzer/dart/analysis/results.dart';

void main() {
  var collection = AnalysisContextCollection(
    includedPaths: [
      '/Users/scheglov/Source/Dart/sdk.git/sdk/pkg/analyzer',
    ],
  );

  for (var i = 0; i < 10; i++) {
    final timer = Stopwatch()..start();
    var contentLength = 0;
    final analysisContext = collection.contexts.first;
    var currentSession = analysisContext.currentSession;
    for (final path in analysisContext.contextRoot.analyzedFiles()) {
      if (path.endsWith('.dart')) {
        var unitResult = currentSession.getParsedUnit(path);
        unitResult as ParsedUnitResult;
        contentLength += unitResult.content.length;
      }
    }
    print('timer: ${timer.elapsedMilliseconds} ms');
    print('contentLength: $contentLength');
    print(
        'speed: ${(contentLength / timer.elapsedMilliseconds) * 1000.0 / 1024} KB / second');
  }
}

Output, after warm up:

timer: 635 ms
contentLength: 18717438
speed: 28785.42999507874 KB / second

Dec 05 '23 18:12 scheglov

If we can demonstrate that it can scale efficiently, etc, I won't stand in the way of such an optimization. But I am very worried about "death by a thousand cuts" here. This is partially why we aren't currently formatting the code either.

I would rather start from a known baseline of performance, and then we can layer on improvements like this once we can measure their real world impact on code bases?

It is also worth noting that code_builder, a fairly heavily used package for code generation, uses this same import prefix approach. I haven't seen any complaints about it 🤷 .

Dec 05 '23 19:12 jakemac53