kernel icon indicating copy to clipboard operation
kernel copied to clipboard

Modular compilation

Open asgerf opened this issue 9 years ago • 2 comments

Use this issue to track general modular compilation concerns.

This is very incomplete, but the general plan so far is this:

The binary format currently uses symbolic references, which generally become meaningless if the referenced file is recompiled. Name-based references were avoided in order to simplify generation of synthetic classes and members without needing to create unique names for them.

However, if we also emit a "header" file in which all method bodies are stripped out (but otherwise the same format), and that header file did not change, we can omit recompilation of dependent libraries because their symbolic references are known to remain intact. And if the header file did change, the dependent libraries probably need recompilation for other reasons anyway.

asgerf avatar May 20 '16 11:05 asgerf

Just some notes from our discussion this morning:

Representation of cross-file references

We should be able to continue using symbolic references even when generating a binary file per module. One idea here was to include a list of external libraries URLs in the binary file. When libraries are serialized, an library internal to the module will be embedded as it is today, but external libraries simply contain their external urls. Within the rest of the binary format all references continue to be encoded in the same way as before (library-id + symbol-id)

Splitting of a "header" file

One possible implementation here is to reuse the kernel format, just treating pieces we don't need as null. In other words: we would basically have 2 serialization passes over the IR, one generating the full binary format, the other can treat method bodies as null and produce only API-level data.

Modeling reexports explicitly

In the context of these header files, we talked about making sure that we detect API-level changes that are transitively visible, even though the kernel format might remove the intermediate steps where dependencies came from.

A good example of this are Dart "reexports". Consider this example:

a.dart:

import 'b.dart';
class A extends B {}

b.dart:

export 'b1.dart' show B;
...
// b depends on both b1 and b2 already
import 'b1.dart' hide B;
import 'b2.dart' hide B;
...

both b1.dart and b2.dart define different implementations of B:

class B { ... }
...

Imagine we create a separate module for each of these libraries:

  • the kernel file of a.dart will mention that A's superclass is b1.B
  • If we don't encode exported symbols, the kernel of b.dart would be empty

If we change b.dart to say:

export 'b2.dart' show B;

we wouldn't see a change in the kernel of b.dart and therefore not necessarily notice that the kernel of a.dart needs to be recomputed. Adding some information to the header file that b.dart exports b1.B/b2.B will make it possible to correctly detect this scenario.

sigmundch avatar May 26 '16 23:05 sigmundch

Commit 3ac36acd4c54a3616ebb68d5cacb5b5c590a26c1 brings significant changes. I'm just going to paste its commit message here for reference:

The concept of a binary library file no longer exists.

A kernel file can contain any number of libraries, and some of these
libraries can be "external".  To reference a class or member from
another build, the class or member must be declared in an external
library.

Members in an external library contain all their type information,
but have no body.

Classes in an external library have their hierarchy information
present, but are not guaranteed to contain all their actual members.

The idea is that references themselves don't really cross module
boundaries, but rather refer to a local definition whose body is
contributed from elsewhere, much like 'external' members in Dart.

A modular backend such as DDC should be able to compile from one of
these kernel files without needing to load auxiliary information from
summaries or other kernel files.

For whole program transformations or backends, a linking step, which
is not yet implemented, must merge classes and members in external
libraries based on their name.

External libraries share the same IR and binary format as ordinary
libraries.  Transformations that affect the interface for a member
or class should transform the external libraries alongside with the
internal ones, ideally without needing to treat them any different.```

asgerf avatar Oct 04 '16 17:10 asgerf