carbon-lang icon indicating copy to clipboard operation
carbon-lang copied to clipboard

Teach source generation to reference more interesting types.

Open chandlerc opened this issue 1 year ago • 0 comments

This teaches our source generation tool to create interesting type references. This include both referencing a weighted distribution of explicitly specified types, and referencing types that are being defined in the generated file.

Generating more interesting explicit types will exercise more of Carbon's prelude, but because C++ doesn't have an automatic prelude with fundamental types like int64_t or tuples, we include some minimal headers when generating the C++ analog. This likely makes the comparison more fair rather than less fair as Carbon's toolchain isn't processing just the generated source, but also its prelude.

The current set of fixed types is based primarily on the set of types that the toolchain currently implements and a set that seems reasonably interesting to exercise for compile time performance. We want to try to cover things that should be optimized in the toolchain, even if a single source file might not typically hit all of them.

The weights of everything are completely arbitrary, based on intuition and some hand inspection of some random source files. There is also an intentional bias towards non-zero coverage and so the tail is much larger than it should be in reality. The result is that the weights more reflect the priority of optimizing compile time than the observed distribution in practice. We can refine the weighting scheme in the future though, potentially with multiple modes to separate coverage from maximally representative weights, etc. The goal is just to have a starting point.

The scheme for referencing the defined types requires some care and complexity to avoid referencing types before they are defined while still referencing all of the types defined and ensuring the number of references is stable even as the order is randomized to avoid fixed patterns in the source code.

All of this also triggered some minor refactoring of the state used to generate class definitions in the source generator. There are probably some good follow-on refactoring opportunities, but I'd prefer to leave those to future work.

I don't have any tests here because most of how this is observable is already tested -- the existing tests ensure the file sizes remain consistent and that the generated code is compiled correctly. But if folks have any ideas of useful tests here, happy to add them.

chandlerc avatar Aug 23 '24 02:08 chandlerc