scala-dev
scala-dev copied to clipboard
Scala Preprocessor / Conditional Compilation
Scala has done quite well so far without any preprocessor but in some situations it would be quite handy to just drop an #ifdef
or #include
into the source code. Let's resist this temptation (of using cpp) and focus instead on solving the actual problems that we have without adding too much complexity.
Goals
- Conditional compilation which is more fine-grained than conditional source files.
- Well integrated into the compiler: No change to build toolchains required. Positions work normally.
Non-goals
- Lexical macros
- Template expansion
- Advanced predicate language
Status quo in Scala
- Conditional source files
- Code generation
- Various code generation tools in use: Plain Scala code, FMPP, M4, etc.
- https://github.com/sbt/sbt-buildinfo as a lightweight alternative for getting config values into source code
All of these require build tool support. Conditional source files are supported out of the box (for simple cross-versioning in sbt) or relatively easy to add manually. sbt-buildinfo is also ready to use. Code generation is more difficult to implement. Different projects use various ad-how solutions.
Conditional compilation in other languages
C
Using the C preprocessor (cpp):
- Powerful
- Low-level
- Error-prone (macro expansion, hygiene)
- Solves many problems (badly) that Scala doesn't have (e.g. imports, macros)
HTML
- Allows simple conditional processing
- Dangerous errors possible when not supported by tooling (because it appears to be backwards compatible but is really not)
Rust
Built-in conditional compilation:
- Predicates are limited to key==value checks, exists(key), any(ps), all(ps), not(p)
- Configuration options set by the build system (some automatically, like platform and version, others user-definable)
- Keys are not unique (i.e. every key is associated with a set of values)
- 3 ways of conditional compilation:
-
cfg attribute
(annotation in Scala) allowed where other attributes are allowed -
cfg_attr
generated attributes conditionally -
cfg
macro includes config values in the source code
-
- Syntactic processing: Excluded source code must be parseable
Java
- No preprocessor or conditional compilation support
-
static final boolean
flags can be used for conditional compilation of well-typed code - Various preprocessing hacks based on preprocessor tools or conditional comments are used in practice
Haskell
Conditional compilation is supported by Cabal:
- Using cpp with macros provided by Cabal for version-specific compilation
Design space
At which level should conditional compilation work?
-
Before parsing: This keeps the config language separate from Scala. It is the most powerful option that allows arbitrary pieces of source code to be made conditional (or replaced by config values) but it is also difficult to reason about and can be abused to create very unreadable code.
-
After lexing: This option is taken by cpp (at least conceptually by using the same lexer as C, even when implemented by a separate tool). If avoids some of the ugly corner cases of the first option (like being able to make the beginning or end of a comment conditional) while still being very flexible. An implementation for Scala would probably be limited to the default tokenizer state (i.e. no conditional compilation within XML expressions or string interpolation). Tokenization rules do not change very often or very much so that cross-compiling to multiple Scala versions should be easy.
-
After parsing: This is the approach taken by Rust. It limits what can be made conditional (e.g. only single methods but not groups of multiple methods with a single directive) and requires valid syntax in all conditional parts. It cannot be used for version-dependent compilation that requires new syntax not supported by the older versions. An additional concern for Scala is the syntax. Using annotations like in Rust is possible but it would break existing Scala conventions that annotations must not change the interpretation of source code. It is also much harder to justify now (rather than from the beginning when designing a new language) because old tools would misinterpret source code that uses this new feature.
-
After typechecking: This is too limiting in practice and can already be implemented (either using macros or with Scala's optimizer and compile-time constants, just like in Java).
From my experience of cross-compiling Scala code and using conditional source directories, I think that option 3 is sufficiently powerful for most use cases. However, if we have to add a new syntax for it anyway (instead of using annotations), option 2 is worth considering.
Which features do we need?
Rust's cfg
attribute + macro combination looks like a good solution for most cases. I don't expect a big demand for conditional annotations, so we can probably skip cfg_attr
. The cfg
macro can be implemented as a (compiler-intrinsic) macro in Scala, the attribute will probably require a dedicated syntax.
Sources of config options
Conditions for conditional compilation can be very complex. There are two options where this complexity can be expressed:
- Keep the predicates in the Scala sources simple (e.g. only key==value checks), requiring the additional logic to be put into the build definition.
- Or keep the build definition simple and allow more complexity in the predicates.
I prefer the first option. We already have a standard build tool which allows arbitrary Scala code to be run as part of the build definition. Other build tools have developed scripting support, too. The standalone scalac
tool would not have to support anything more than allow configuration options to be set from the command line. We should consider some predefined options but even in seemingly simple cases (like the version number) this could quickly lead to a demand for a more complex predicate language.
Looks really good, Stefan!
Do you think you could expand a bit on what is meant by Rust's cfg attribute and macro behaviour? Either just describe it or better yet with examples. Thanks!
Yes, very nice writeup! Thanks for doing the hard work and not just dumping out some syntax ideas :-)
The cfg
annotation (or "attribute" in Rust) conditionally enables a piece of code (where an attribute is allowed, e.g. a function definition but not arbitrary places). In Scala it could be something like this:
@cfg(""" binaryVersion = "2.13" """)
def foo: Int = ... // 2.13 version
@cfg(""" not(binaryVersion = "2.13") """)
def foo: Int = ... // 2.11/2.12 version
binaryVersion
in this example is a config option. They live in a namespace which is distinct from any regular one in Scala code. These annotations are processed logically after parser but before typer (probably not quite so in practice because I expect you'll need to do some typing just to recognize the name cfg
) so the disabled versions of the method have to parse but not typecheck.
The cfg
macro provides a way to bring config values into Scala terms, e.g.
println("The binary version is " + cfg("binaryVersion"))
Values produced by the macro are expanded into literals at compile time.
A possible way to avoid the namer issue (especially at the top level) without too much complexity would be a new annotation-like syntax like @if(...)
. This would also allow us to avoid the quotes and instead treat all names within the predicate as config names.
These annotations are processed logically after parser but before typer
Could this express, for example
- if (binaryVersion > 2.13)
import a.X
elseimport b.X
- if (...)
class A extends X
elseclass A extends Y
The
cfg
macro provides a way to bring config values into Scala terms
Do we need / want that? :-)
-
In the scheme with the simple predicate language more complex predicates like
binaryVersion > 2.13
need to be split up into a flag that can be checked by the predicate and some code in the build script to compute the flag. Additional operators could be added to the predicate language (but not user-definable). -
I don't think normal annotations can be used on imports at the moment but this should be easy to add (especially if we go with an annotation-like special syntax instead of a real annotation).
-
The macro could replace sbt-buildinfo. We're adding a standard way of defining config variables and passing them to the compiler. I think it makes sense to use this mechanism for reifying them at the term level if we already have it.
Thanks!
Can you think of cases where the annotation based syntax would not work well enough? My example above is a superclass, that could be worked around with a type alias. But for example if I want to add a parent conditionally (and not extend anything in the other case), I don't see how that could be done (except making two copies of the entire class).
But for example if I want to add a parent conditionally (and not extend anything in the other case)
You can always extend AnyRef
or Any
. This doesn't work anymore if you need to pass arguments to the superclass. You'd have to write two separate versions.
Here's my prototype so far: https://github.com/szeiger/scala/tree/wip/preprocessor
I'm not quite happy with the set-based key/value checks. It doesn't feel correct with Scala syntax.
Supporting imports will need a bit of refactoring in the parser. It's not as straight-forward to add as I had hoped.
I wanted to try it with collections-compat but discovered that annotations do not work for package objects. This is also a limitation of the parser, to it affects my pseudo-annotations as well. I'm not sure if this is intentional or a bug. Fixing it should be on the same order of difficulty as supporting imports.
Except for these limitations it should be fully functional.
The patch has
//case t @ Annotated(annot, arg) => t
so supporting annotation ascriptions is planned, right?
I assume it's trivial to implement but didn't get around to testing it yet.
Looks like the restriction on disallowing annotations in general for package objects is intentional: https://www.scala-lang.org/files/archive/spec/2.13/09-top-level-definitions.html#compilation-units. But since @if
is not a real annotation we can special-case it for package objects the same way as for imports.
The latest update supports imports, package objects and annotated expressions.
Here's a version of scala-collection-compat that does all the conditional compilation with the proprocessor: https://github.com/szeiger/scala-collection-compat/tree/wip/preprocessor-test. This shows the limits of what is possible. In practice I would probably keep 2.13 completely separate but use conditional compilation for the small differences between 2.11 and 2.12.
What are the concrete use cases for this?
IMO proposals should always start with a set of use cases, and their design should be driven and guided by how well they solve those use cases.
Thanks for the detailed write up! Some quick questions.
How do you envision that the code editing and navigation experience would work in IDEs for conditionally compiled statements?
Can you maybe elaborate on the goal below with an example situation where conditional source files have been insufficient in practice?
Conditional compilation which is more fine-grained than conditional source files.
I am concerned that preprocessing introduces one more way to solve the same problem that conditional source files already solve. Conditional source files have their flaws but they work mostly OK with IDE tooling.
I would love this, biggest pain as library maintainers is having to have (mostly) redundant branches because we can't do conditionals based on the current scala version
How do you envision that the code editing and navigation experience would work in IDEs for conditionally compiled statements?
The conditionals should use the value that corresponds to the current compiler version that is set by the IDE?
What are the concrete use cases for this?
Migrating to the new scala collections is a major one if you use CanBuildFrom
and stuff like breakOut
in your code.
https://github.com/mdedetrich/scalajson/blob/master/build.sbt#L98 is another example
How do you envision that the code editing and navigation experience would work in IDEs for conditionally compiled statements?
The same way that different source folders work. An IDE that imports the sbt build (like IntelliJ currently does) would also see the config options that are passed to scalac (and computed in the build in the same way as the source folders).
The motivating use case is indeed the collections migration where we see the need for separate source files in many cases. I neglected to put that into the proposal because the proposal "we should have a preprocessor for conditional compilation" already existed when I adopted it to create a design.
Here is a version of scala-collection-compat that takes my current preprocessor prototype to the limit: https://github.com/szeiger/scala-collection-compat/tree/wip/preprocessor-test. Note that this is not a style I would recommend. For collection-compat, assuming that 2.11 and 2.12 already had the preprocessor, I would have used the preprocessor to combine and simplify the 2.11 and 2.12 versions (which are very similar) and kept the entirely different 2.13 version separate.
I am personally somewhat doubtful of this. Cross-version sources have worked well enough, are supported by every build tool (SBT, Mill, our Bazel build at work), and encourage the best practice of keeping your version-specific stuff encapsulated in a small number of files rather scattering if-defs all over the codebase.
No change to build toolchains required. Positions work normally.
is already the case right now with version-dependent folders. No change to anyone's build toolchains are necessary - everything already works - and positions are correct. Even IDE support works great, better than it does in #ifdef
-heavy CSharp/C++ code anyway!
Not mentioned in this proposal is Scala.js. The Scala.js community has been working with platform-specific source folders forever. It's worked well. I don't think I've heard any great groundswell of demand for #ifdef
preprocessor directives (i believe only one project in the past 5 years cared enough to even try implementing them)
Here's a summary of my AST-based preprocessor prototype (https://github.com/szeiger/scala/tree/wip/preprocessor). It's the same approach that Rust uses for the cfg
attribute and macro.
Syntax
Conditional compilation is done with a pseudo-annotation called @if
. Note that if
is a keyword, which makes this illegal as regular annotation syntax (you would have to write @`if`
instead). It takes one argument, which is a preprocessor predicate (see below).
@if
can be used in the following places:
- Wherever normal annotations are allowed
- In front of package objects
- In front of
packge p { ... }
style package definitions (but notpackage p; ...
) - In front of
import
statements
Note that the source code must be completely parseable into an AST before preprocessing. For example, this is allowed:
@if(scala213) val x = 1
@if(!scala213) val x = 0
Whereas this is not:
val x = (1: @if(scala213))
(0: @if(!scala213))
Configuration Options
Configuration options consist of string keys associated with a set of string values. They are passed to scalac with -Ckey=value
. In sbt they can be set via scalacOptions
. For example:
scalacOptions ++= Seq("-Cfeature=foo", "-Cfeature=bar")
This gives the config option feature
the value Set("foo", "bar")
.
Preprocessor Predicates
Predicates for the @if
pseudo-annotation are parsed as Scala expressions (like any other annotation argument) but they are processed by a special interpreter which supports only a limited type of expressions:
-
Identifier
==
String Literal: Evaluates totrue
if the config option designated by the identifier has the string literal as one of its values,false
otherwise. -
Identifier: Evaluates to
true
if the config option designated by the identifier is defined (i.e. it has a non-empty set of values),false
otherwise. - Boolean expressions on predicates using
&&
,||
and!
with the usual meaning.
Preprocessing
The preprocessor runs in the new preprocessor
phase directly after parser
. It evaluates all preprocessor annotations, removing both the annotations themselves and all trees for which the predicates evaluate to false
. The effect is the same as if the annotated part was not there in the first place. No typechecking is attempted on the removed parts and no names are entered into symbol tables.
Reifying Configuration Options
The scala.sys.cfg
macro can be used to expand a config option at compile-time into its values. For example, using the previous definition of feature
,
val features = sys.cfg("feature")
is expanded into
val features = Set[String]("foo", "bar")
I don’t think having a preprocessor is a good idea. It adds another meta-language layer above the Scala language, which increases the cognitive load for reading source code. Also, unless such a meta-language is as powerful as a general-purpose language (which is something we don’t want!), we will still need to rely on the build system to accommodate some specific needs that are not supported by the few preprocessor directives of the system. I believe such a system would have very limited applicability compared to its cost.
@szeiger can you can spill a bit more ink on that motivating use case? And are there others?
For instance,
- What about the collections migration requires it? Is it a special case? Is it a design flaw?
- What is wrong with current solutions, which clearly exist, for example separate source directories? Is this an overall win over it, and why?
- Who is affected by it? How large of an audience is it for?
I am personally somewhat doubtful of this. Cross-version sources have worked well enough, are supported by every build tool (SBT, Mill, our Bazel build at work), and encourage the best practice of keeping your version-specific stuff encapsulated in a small number of files rather scattering if-defs all over the codebase.
In my experience, cross version sources have resulted in massive amounts of code duplication. You basically have duplicate the entire source file/s minus the difference you are targeting. In some cases you can get around this by using traits, but that then opens other problems
@som-snytt I don't mean to come across that way, but shouldn't said research be done before making such a significant investment?
@mdedetrich that's interesting. Can you explain why separate directories results in so much duplication?
Part of why I'm asking is because the deeper you understand a problem, the more you understand the solution space. There could be solutions that haven't been thought of or explored fully.
But partially, it's because if we don't write these things down, people won't appreciate the direction being taken.
Anyway I seem to recall such a case mentioned recently by @sjrd on gitter that may have pointed in this direction. I'm not sure it generalizes though.
Building for two platforms should mean just building two branches.
I've tried this, it's not a great solution. You lose all sorts of things doing things in multiple git branches v.s. just having separate folders:
- No more parallel builds in your build tool
- No more find-and-replace in your editor
- No more search on Github, which only lets you search
master
- No dependencies across the cross-axis: what if I want my Scala-JVM server to depend on my Scala.js executable, with shared code? What if my Scala 2.12 deploy script needs to use the assembly of my Scala 2.11 Spark 2.3 job?
- No just running
publishAll
without interleaving your publishing with a whole lot of git-fu
Using git branches for cross versioning always sounds great initially, but you are giving up a lot of commonly-used functionality and workflows in order to get there.
Version specific sources files are pretty easy, and is the de-facto convention throughout the entire community. Matthew doesn't specify why he doesn't like splitting things into version-specific traits, but I've done it for years over a million lines of cross-versioned code across three different cross axes and it's never been particularly difficult.
I don't think it would be an exaggeration to say I maintain more cross-built Scala code than anyone else in the community. Even at the extremes, like when Ammonite has to deal with incompatibilities in path-dependent nested-trait-cakes inside scala.tools.nsc.Global
, it's always broken down pretty easily into separate static methods or separate traits for each cross-version. Maybe there are cases where you actually do have to duplicate large amounts of code, but I haven't seen them
IMO this proposal as stated suffers from the same problem as many others: the proposed solution is clearly stated, well analyzed and thoroughly studied, but the problem it is trying to solve is relegated to a single throwaway sentence without elaboration. That seems very much like the wrong way of approaching things, and is an approach very prone to ending up with a solution looking for a problem
Matthew doesn't specify why he doesn't like splitting things into version-specific traits, but I've done it for years over a million lines of cross-versioned code across three different cross axes and it's never been particularly difficult.
Well for starters, its a workaround. Traits are a language level abstraction for structuring your code. Its not designed for dealing with breaking differences between different Scala versions. Its used this way because its the only sane way to handle differences between platforms and scala versions if you are able to do so (apart from Git branches which you already expanded upon). They are kind of like pollution, they wouldn't normally be there if it wasn't for the fact you were targeting another Scala version/platform which doesn't support feature X.
There are also difficulties in using traits, they can cause issues with binary compatibility in non trivial circumstances. And they also don't abstract over everything cleanly, as shown in the repo I linked earlier.
Why is "a language level abstraction for structuring your code" not designed for this any more than any particular use case? They aren't designed for anything specific, they are a tool to use when desired. For any use case you can say "they wouldn't normally be there if it wasn't for $USE_CASE".
So far I'm not convinced there is any sane way to handle the differences, including this proposal.
Can't we allocate the same effort to getting TASTY shipped instead? Wouldn't that solve many of the same problems, including the binary compatibility problem of traits @mdedetrich mentions?
Well for starters, its a workaround. Traits are a language level abstraction for structuring your code. Its not designed for dealing with breaking differences between different Scala versions. Its used this way because its the only sane way to handle differences between platforms and scala versions if you are able to do so (apart from Git branches which you already expanded upon). They are kind of like pollution, they wouldn't normally be there if it wasn't for the fact you were targeting another Scala version/platform which doesn't support feature X.
Nothing in here sounds problematic: we have a problem, and we have not only a working solution, but as you say a sane solution.
Binary compatibility is a general Scala problem, but I'm not going to suggest we start adding language features that duplicate trait functionality because traits cause binary compatibility issues.
You haven't actually explained what is wrong with your repo linked above. That build.sbt file looks 100% fine to me.
if this is supposed to help abstract over scala versions, i don't think embedding it into scala would work either way i would much prefer centralized way (like folders) over scattering ifdefs all over the code