scala-dev icon indicating copy to clipboard operation
scala-dev copied to clipboard

Scala Preprocessor / Conditional Compilation

Open szeiger opened this issue 5 years ago • 67 comments

Scala has done quite well so far without any preprocessor but in some situations it would be quite handy to just drop an #ifdef or #include into the source code. Let's resist this temptation (of using cpp) and focus instead on solving the actual problems that we have without adding too much complexity.

Goals

  • Conditional compilation which is more fine-grained than conditional source files.
  • Well integrated into the compiler: No change to build toolchains required. Positions work normally.

Non-goals

  • Lexical macros
  • Template expansion
  • Advanced predicate language

Status quo in Scala

  • Conditional source files
  • Code generation
    • Various code generation tools in use: Plain Scala code, FMPP, M4, etc.
  • https://github.com/sbt/sbt-buildinfo as a lightweight alternative for getting config values into source code

All of these require build tool support. Conditional source files are supported out of the box (for simple cross-versioning in sbt) or relatively easy to add manually. sbt-buildinfo is also ready to use. Code generation is more difficult to implement. Different projects use various ad-how solutions.

Conditional compilation in other languages

C

Using the C preprocessor (cpp):

  • Powerful
  • Low-level
  • Error-prone (macro expansion, hygiene)
  • Solves many problems (badly) that Scala doesn't have (e.g. imports, macros)

HTML

Conditional comments:

  • Allows simple conditional processing
  • Dangerous errors possible when not supported by tooling (because it appears to be backwards compatible but is really not)

Rust

Built-in conditional compilation:

  • Predicates are limited to key==value checks, exists(key), any(ps), all(ps), not(p)
  • Configuration options set by the build system (some automatically, like platform and version, others user-definable)
  • Keys are not unique (i.e. every key is associated with a set of values)
  • 3 ways of conditional compilation:
    • cfg attribute (annotation in Scala) allowed where other attributes are allowed
    • cfg_attr generated attributes conditionally
    • cfg macro includes config values in the source code
  • Syntactic processing: Excluded source code must be parseable

Java

  • No preprocessor or conditional compilation support
  • static final boolean flags can be used for conditional compilation of well-typed code
  • Various preprocessing hacks based on preprocessor tools or conditional comments are used in practice

Haskell

Conditional compilation is supported by Cabal:

  • Using cpp with macros provided by Cabal for version-specific compilation

Design space

At which level should conditional compilation work?

  1. Before parsing: This keeps the config language separate from Scala. It is the most powerful option that allows arbitrary pieces of source code to be made conditional (or replaced by config values) but it is also difficult to reason about and can be abused to create very unreadable code.

  2. After lexing: This option is taken by cpp (at least conceptually by using the same lexer as C, even when implemented by a separate tool). If avoids some of the ugly corner cases of the first option (like being able to make the beginning or end of a comment conditional) while still being very flexible. An implementation for Scala would probably be limited to the default tokenizer state (i.e. no conditional compilation within XML expressions or string interpolation). Tokenization rules do not change very often or very much so that cross-compiling to multiple Scala versions should be easy.

  3. After parsing: This is the approach taken by Rust. It limits what can be made conditional (e.g. only single methods but not groups of multiple methods with a single directive) and requires valid syntax in all conditional parts. It cannot be used for version-dependent compilation that requires new syntax not supported by the older versions. An additional concern for Scala is the syntax. Using annotations like in Rust is possible but it would break existing Scala conventions that annotations must not change the interpretation of source code. It is also much harder to justify now (rather than from the beginning when designing a new language) because old tools would misinterpret source code that uses this new feature.

  4. After typechecking: This is too limiting in practice and can already be implemented (either using macros or with Scala's optimizer and compile-time constants, just like in Java).

From my experience of cross-compiling Scala code and using conditional source directories, I think that option 3 is sufficiently powerful for most use cases. However, if we have to add a new syntax for it anyway (instead of using annotations), option 2 is worth considering.

Which features do we need?

Rust's cfg attribute + macro combination looks like a good solution for most cases. I don't expect a big demand for conditional annotations, so we can probably skip cfg_attr. The cfg macro can be implemented as a (compiler-intrinsic) macro in Scala, the attribute will probably require a dedicated syntax.

Sources of config options

Conditions for conditional compilation can be very complex. There are two options where this complexity can be expressed:

  • Keep the predicates in the Scala sources simple (e.g. only key==value checks), requiring the additional logic to be put into the build definition.
  • Or keep the build definition simple and allow more complexity in the predicates.

I prefer the first option. We already have a standard build tool which allows arbitrary Scala code to be run as part of the build definition. Other build tools have developed scripting support, too. The standalone scalac tool would not have to support anything more than allow configuration options to be set from the command line. We should consider some predefined options but even in seemingly simple cases (like the version number) this could quickly lead to a demand for a more complex predicate language.

szeiger avatar Jul 01 '19 17:07 szeiger

Looks really good, Stefan!

Do you think you could expand a bit on what is meant by Rust's cfg attribute and macro behaviour? Either just describe it or better yet with examples. Thanks!

dwijnand avatar Jul 02 '19 05:07 dwijnand

Yes, very nice writeup! Thanks for doing the hard work and not just dumping out some syntax ideas :-)

lrytz avatar Jul 02 '19 10:07 lrytz

The cfg annotation (or "attribute" in Rust) conditionally enables a piece of code (where an attribute is allowed, e.g. a function definition but not arbitrary places). In Scala it could be something like this:

@cfg(""" binaryVersion = "2.13" """)
def foo: Int = ... // 2.13 version

@cfg(""" not(binaryVersion = "2.13") """)
def foo: Int = ... // 2.11/2.12 version

binaryVersion in this example is a config option. They live in a namespace which is distinct from any regular one in Scala code. These annotations are processed logically after parser but before typer (probably not quite so in practice because I expect you'll need to do some typing just to recognize the name cfg) so the disabled versions of the method have to parse but not typecheck.

The cfg macro provides a way to bring config values into Scala terms, e.g.

println("The binary version is " + cfg("binaryVersion"))

Values produced by the macro are expanded into literals at compile time.

szeiger avatar Jul 02 '19 14:07 szeiger

A possible way to avoid the namer issue (especially at the top level) without too much complexity would be a new annotation-like syntax like @if(...). This would also allow us to avoid the quotes and instead treat all names within the predicate as config names.

szeiger avatar Jul 04 '19 13:07 szeiger

These annotations are processed logically after parser but before typer

Could this express, for example

  • if (binaryVersion > 2.13) import a.X else import b.X
  • if (...) class A extends X else class A extends Y

The cfg macro provides a way to bring config values into Scala terms

Do we need / want that? :-)

lrytz avatar Jul 04 '19 13:07 lrytz

  • In the scheme with the simple predicate language more complex predicates like binaryVersion > 2.13 need to be split up into a flag that can be checked by the predicate and some code in the build script to compute the flag. Additional operators could be added to the predicate language (but not user-definable).

  • I don't think normal annotations can be used on imports at the moment but this should be easy to add (especially if we go with an annotation-like special syntax instead of a real annotation).

  • The macro could replace sbt-buildinfo. We're adding a standard way of defining config variables and passing them to the compiler. I think it makes sense to use this mechanism for reifying them at the term level if we already have it.

szeiger avatar Jul 04 '19 13:07 szeiger

Thanks!

Can you think of cases where the annotation based syntax would not work well enough? My example above is a superclass, that could be worked around with a type alias. But for example if I want to add a parent conditionally (and not extend anything in the other case), I don't see how that could be done (except making two copies of the entire class).

lrytz avatar Jul 04 '19 14:07 lrytz

But for example if I want to add a parent conditionally (and not extend anything in the other case)

You can always extend AnyRef or Any. This doesn't work anymore if you need to pass arguments to the superclass. You'd have to write two separate versions.

szeiger avatar Jul 04 '19 14:07 szeiger

Here's my prototype so far: https://github.com/szeiger/scala/tree/wip/preprocessor

I'm not quite happy with the set-based key/value checks. It doesn't feel correct with Scala syntax.

Supporting imports will need a bit of refactoring in the parser. It's not as straight-forward to add as I had hoped.

I wanted to try it with collections-compat but discovered that annotations do not work for package objects. This is also a limitation of the parser, to it affects my pseudo-annotations as well. I'm not sure if this is intentional or a bug. Fixing it should be on the same order of difficulty as supporting imports.

Except for these limitations it should be fully functional.

szeiger avatar Jul 05 '19 19:07 szeiger

The patch has

         //case t @ Annotated(annot, arg) => t

so supporting annotation ascriptions is planned, right?

lrytz avatar Jul 08 '19 06:07 lrytz

I assume it's trivial to implement but didn't get around to testing it yet.

szeiger avatar Jul 08 '19 15:07 szeiger

Looks like the restriction on disallowing annotations in general for package objects is intentional: https://www.scala-lang.org/files/archive/spec/2.13/09-top-level-definitions.html#compilation-units. But since @if is not a real annotation we can special-case it for package objects the same way as for imports.

szeiger avatar Jul 08 '19 15:07 szeiger

The latest update supports imports, package objects and annotated expressions.

szeiger avatar Jul 08 '19 20:07 szeiger

Here's a version of scala-collection-compat that does all the conditional compilation with the proprocessor: https://github.com/szeiger/scala-collection-compat/tree/wip/preprocessor-test. This shows the limits of what is possible. In practice I would probably keep 2.13 completely separate but use conditional compilation for the small differences between 2.11 and 2.12.

szeiger avatar Jul 09 '19 13:07 szeiger

What are the concrete use cases for this?

IMO proposals should always start with a set of use cases, and their design should be driven and guided by how well they solve those use cases.

nafg avatar Jul 12 '19 10:07 nafg

Thanks for the detailed write up! Some quick questions.

How do you envision that the code editing and navigation experience would work in IDEs for conditionally compiled statements?

Can you maybe elaborate on the goal below with an example situation where conditional source files have been insufficient in practice?

Conditional compilation which is more fine-grained than conditional source files.

I am concerned that preprocessing introduces one more way to solve the same problem that conditional source files already solve. Conditional source files have their flaws but they work mostly OK with IDE tooling.

olafurpg avatar Jul 12 '19 10:07 olafurpg

I would love this, biggest pain as library maintainers is having to have (mostly) redundant branches because we can't do conditionals based on the current scala version

How do you envision that the code editing and navigation experience would work in IDEs for conditionally compiled statements?

The conditionals should use the value that corresponds to the current compiler version that is set by the IDE?

What are the concrete use cases for this?

Migrating to the new scala collections is a major one if you use CanBuildFrom and stuff like breakOut in your code.

https://github.com/mdedetrich/scalajson/blob/master/build.sbt#L98 is another example

mdedetrich avatar Jul 12 '19 12:07 mdedetrich

How do you envision that the code editing and navigation experience would work in IDEs for conditionally compiled statements?

The same way that different source folders work. An IDE that imports the sbt build (like IntelliJ currently does) would also see the config options that are passed to scalac (and computed in the build in the same way as the source folders).

szeiger avatar Jul 12 '19 12:07 szeiger

The motivating use case is indeed the collections migration where we see the need for separate source files in many cases. I neglected to put that into the proposal because the proposal "we should have a preprocessor for conditional compilation" already existed when I adopted it to create a design.

Here is a version of scala-collection-compat that takes my current preprocessor prototype to the limit: https://github.com/szeiger/scala-collection-compat/tree/wip/preprocessor-test. Note that this is not a style I would recommend. For collection-compat, assuming that 2.11 and 2.12 already had the preprocessor, I would have used the preprocessor to combine and simplify the 2.11 and 2.12 versions (which are very similar) and kept the entirely different 2.13 version separate.

szeiger avatar Jul 12 '19 12:07 szeiger

I am personally somewhat doubtful of this. Cross-version sources have worked well enough, are supported by every build tool (SBT, Mill, our Bazel build at work), and encourage the best practice of keeping your version-specific stuff encapsulated in a small number of files rather scattering if-defs all over the codebase.

No change to build toolchains required. Positions work normally. is already the case right now with version-dependent folders. No change to anyone's build toolchains are necessary - everything already works - and positions are correct. Even IDE support works great, better than it does in #ifdef-heavy CSharp/C++ code anyway!

Not mentioned in this proposal is Scala.js. The Scala.js community has been working with platform-specific source folders forever. It's worked well. I don't think I've heard any great groundswell of demand for #ifdef preprocessor directives (i believe only one project in the past 5 years cared enough to even try implementing them)

lihaoyi-databricks avatar Jul 12 '19 12:07 lihaoyi-databricks

Here's a summary of my AST-based preprocessor prototype (https://github.com/szeiger/scala/tree/wip/preprocessor). It's the same approach that Rust uses for the cfg attribute and macro.

Syntax

Conditional compilation is done with a pseudo-annotation called @if. Note that if is a keyword, which makes this illegal as regular annotation syntax (you would have to write @`if` instead). It takes one argument, which is a preprocessor predicate (see below).

@if can be used in the following places:

  • Wherever normal annotations are allowed
  • In front of package objects
  • In front of packge p { ... } style package definitions (but not package p; ...)
  • In front of import statements

Note that the source code must be completely parseable into an AST before preprocessing. For example, this is allowed:

@if(scala213) val x = 1
@if(!scala213) val x = 0

Whereas this is not:

val x = (1: @if(scala213))
        (0: @if(!scala213))

Configuration Options

Configuration options consist of string keys associated with a set of string values. They are passed to scalac with -Ckey=value. In sbt they can be set via scalacOptions. For example:

scalacOptions ++= Seq("-Cfeature=foo", "-Cfeature=bar")

This gives the config option feature the value Set("foo", "bar").

Preprocessor Predicates

Predicates for the @if pseudo-annotation are parsed as Scala expressions (like any other annotation argument) but they are processed by a special interpreter which supports only a limited type of expressions:

  • Identifier == String Literal: Evaluates to true if the config option designated by the identifier has the string literal as one of its values, false otherwise.
  • Identifier: Evaluates to true if the config option designated by the identifier is defined (i.e. it has a non-empty set of values), false otherwise.
  • Boolean expressions on predicates using &&, || and ! with the usual meaning.

Preprocessing

The preprocessor runs in the new preprocessor phase directly after parser. It evaluates all preprocessor annotations, removing both the annotations themselves and all trees for which the predicates evaluate to false. The effect is the same as if the annotated part was not there in the first place. No typechecking is attempted on the removed parts and no names are entered into symbol tables.

Reifying Configuration Options

The scala.sys.cfg macro can be used to expand a config option at compile-time into its values. For example, using the previous definition of feature,

val features = sys.cfg("feature")

is expanded into

val features = Set[String]("foo", "bar")

szeiger avatar Jul 12 '19 15:07 szeiger

I don’t think having a preprocessor is a good idea. It adds another meta-language layer above the Scala language, which increases the cognitive load for reading source code. Also, unless such a meta-language is as powerful as a general-purpose language (which is something we don’t want!), we will still need to rely on the build system to accommodate some specific needs that are not supported by the few preprocessor directives of the system. I believe such a system would have very limited applicability compared to its cost.

julienrf avatar Jul 12 '19 19:07 julienrf

@szeiger can you can spill a bit more ink on that motivating use case? And are there others?

For instance,

  1. What about the collections migration requires it? Is it a special case? Is it a design flaw?
  2. What is wrong with current solutions, which clearly exist, for example separate source directories? Is this an overall win over it, and why?
  3. Who is affected by it? How large of an audience is it for?

nafg avatar Jul 12 '19 20:07 nafg

I am personally somewhat doubtful of this. Cross-version sources have worked well enough, are supported by every build tool (SBT, Mill, our Bazel build at work), and encourage the best practice of keeping your version-specific stuff encapsulated in a small number of files rather scattering if-defs all over the codebase.

In my experience, cross version sources have resulted in massive amounts of code duplication. You basically have duplicate the entire source file/s minus the difference you are targeting. In some cases you can get around this by using traits, but that then opens other problems

mdedetrich avatar Jul 12 '19 20:07 mdedetrich

@som-snytt I don't mean to come across that way, but shouldn't said research be done before making such a significant investment?

@mdedetrich that's interesting. Can you explain why separate directories results in so much duplication?

Part of why I'm asking is because the deeper you understand a problem, the more you understand the solution space. There could be solutions that haven't been thought of or explored fully.

But partially, it's because if we don't write these things down, people won't appreciate the direction being taken.

Anyway I seem to recall such a case mentioned recently by @sjrd on gitter that may have pointed in this direction. I'm not sure it generalizes though.

nafg avatar Jul 12 '19 20:07 nafg

Building for two platforms should mean just building two branches.

I've tried this, it's not a great solution. You lose all sorts of things doing things in multiple git branches v.s. just having separate folders:

  • No more parallel builds in your build tool
  • No more find-and-replace in your editor
  • No more search on Github, which only lets you search master
  • No dependencies across the cross-axis: what if I want my Scala-JVM server to depend on my Scala.js executable, with shared code? What if my Scala 2.12 deploy script needs to use the assembly of my Scala 2.11 Spark 2.3 job?
  • No just running publishAll without interleaving your publishing with a whole lot of git-fu

Using git branches for cross versioning always sounds great initially, but you are giving up a lot of commonly-used functionality and workflows in order to get there.


Version specific sources files are pretty easy, and is the de-facto convention throughout the entire community. Matthew doesn't specify why he doesn't like splitting things into version-specific traits, but I've done it for years over a million lines of cross-versioned code across three different cross axes and it's never been particularly difficult.

I don't think it would be an exaggeration to say I maintain more cross-built Scala code than anyone else in the community. Even at the extremes, like when Ammonite has to deal with incompatibilities in path-dependent nested-trait-cakes inside scala.tools.nsc.Global, it's always broken down pretty easily into separate static methods or separate traits for each cross-version. Maybe there are cases where you actually do have to duplicate large amounts of code, but I haven't seen them

IMO this proposal as stated suffers from the same problem as many others: the proposed solution is clearly stated, well analyzed and thoroughly studied, but the problem it is trying to solve is relegated to a single throwaway sentence without elaboration. That seems very much like the wrong way of approaching things, and is an approach very prone to ending up with a solution looking for a problem

lihaoyi-databricks avatar Jul 12 '19 21:07 lihaoyi-databricks

Matthew doesn't specify why he doesn't like splitting things into version-specific traits, but I've done it for years over a million lines of cross-versioned code across three different cross axes and it's never been particularly difficult.

Well for starters, its a workaround. Traits are a language level abstraction for structuring your code. Its not designed for dealing with breaking differences between different Scala versions. Its used this way because its the only sane way to handle differences between platforms and scala versions if you are able to do so (apart from Git branches which you already expanded upon). They are kind of like pollution, they wouldn't normally be there if it wasn't for the fact you were targeting another Scala version/platform which doesn't support feature X.

There are also difficulties in using traits, they can cause issues with binary compatibility in non trivial circumstances. And they also don't abstract over everything cleanly, as shown in the repo I linked earlier.

mdedetrich avatar Jul 12 '19 21:07 mdedetrich

Why is "a language level abstraction for structuring your code" not designed for this any more than any particular use case? They aren't designed for anything specific, they are a tool to use when desired. For any use case you can say "they wouldn't normally be there if it wasn't for $USE_CASE".

So far I'm not convinced there is any sane way to handle the differences, including this proposal.

Can't we allocate the same effort to getting TASTY shipped instead? Wouldn't that solve many of the same problems, including the binary compatibility problem of traits @mdedetrich mentions?

nafg avatar Jul 12 '19 23:07 nafg

Well for starters, its a workaround. Traits are a language level abstraction for structuring your code. Its not designed for dealing with breaking differences between different Scala versions. Its used this way because its the only sane way to handle differences between platforms and scala versions if you are able to do so (apart from Git branches which you already expanded upon). They are kind of like pollution, they wouldn't normally be there if it wasn't for the fact you were targeting another Scala version/platform which doesn't support feature X.

Nothing in here sounds problematic: we have a problem, and we have not only a working solution, but as you say a sane solution.

Binary compatibility is a general Scala problem, but I'm not going to suggest we start adding language features that duplicate trait functionality because traits cause binary compatibility issues.

You haven't actually explained what is wrong with your repo linked above. That build.sbt file looks 100% fine to me.

lihaoyi-databricks avatar Jul 13 '19 01:07 lihaoyi-databricks

if this is supposed to help abstract over scala versions, i don't think embedding it into scala would work either way i would much prefer centralized way (like folders) over scattering ifdefs all over the code

OlegYch avatar Jul 13 '19 02:07 OlegYch