Ammonite
Ammonite copied to clipboard
Discussion - JARs and native executables from scripts
Opening this to know if there's some interest and discuss details of a possible implementation.
Basically, what I'd like to discuss is adding the possibility to do things like
$ amm script.sc --jar
$ ./script.jar
That makes it easier to copy scripts in one's bin directory (without having to copy the script itself and all the scripts it imports), or allows to generate them once (with the right Ammonite version) and then forget about the exact invocation to generate it.
(I remember seeing such a feature being hinted at some time ago, maybe by @lihaoyi, but can't find out where…)
On top of that, we could add the possibility to convert these JARs to native executables, via GraalVM and / or scala-native, for faster start-up times:
$ amm script.sc --native-image
$ ./script
$ amm script.sc --scala-native
$ ./script
What kind of JAR should we generate?
Even though assemblies seem like a natural choice, I'd argue coursier bootstraps make a better candidate.
Assemblies
Assemblies (a.k.a. "fat JARs" or "uber JARs") are quite common, when one want to package JVM applications in a single JAR. These basically take all the JARs of the classpath, and merge their content. Yet they suffer some drawbacks, most notably:
- some collisions may arise when merging JARs content
- it's hard to make sense of what's in an assembly once it's packaged: we can't really know which dependencies are in it, what are their versions, etc.
Nested JARs
To circumvent these shortcomings, Spring Boot, but also coursier, allow to nest JARs, so that these don't have to be merged:
$ cs bootstrap ammonite:2.0.4 --standalone
$ unzip -l amm
…
146682 01-14-2020 05:24 coursier/bootstrap/launcher/jars/ammonite_2.13.1-2.0.4.jar
178031 01-14-2020 05:24 coursier/bootstrap/launcher/jars/ammonite-terminal_2.13-2.0.4.jar
131203 01-14-2020 05:24 coursier/bootstrap/launcher/jars/ammonite-ops_2.13-2.0.4.jar
120047 01-14-2020 05:24 coursier/bootstrap/launcher/jars/ammonite-util_2.13-2.0.4.jar
206114 01-14-2020 05:24 coursier/bootstrap/launcher/jars/ammonite-runtime_2.13.1-2.0.4.jar
…
coursier bootstraps
One shortcoming of both assemblies and nesting JARs is the size of the resulting JAR:
$ ls -lh amm
-rwxr-xr-x 1 alex staff 35M oct 28 14:32 amm
To make such JARs smaller, coursier uses the fact that most of these nested JARs come straight from public repositories, such as Maven Central. Instead of embedding JARs, it allows to embed the URLs of such JARs, like https://repo1.maven.org/maven2/com/lihaoyi/ammonite-repl_2.13.1/2.0.4/ammonite-repl_2.13.1-2.0.4.jar:
$ rm -f amm
$ cs bootstrap ammonite:2.0.4
$ ls -lh amm
-rwxr-xr-x 1 alex staff 31K oct 28 14:32 amm
$ unzip -p amm coursier/bootstrap/launcher/bootstrap-jar-urls
…
https://repo1.maven.org/maven2/com/lihaoyi/ammonite-interp-api_2.13.1/2.0.4/ammonite-interp-api_2.13.1-2.0.4-sources.jar
https://repo1.maven.org/maven2/com/lihaoyi/ammonite-interp-api_2.13.1/2.0.4/ammonite-interp-api_2.13.1-2.0.4.jar
https://repo1.maven.org/maven2/com/lihaoyi/ammonite-interp_2.13.1/2.0.4/ammonite-interp_2.13.1-2.0.4-sources.jar
https://repo1.maven.org/maven2/com/lihaoyi/ammonite-interp_2.13.1/2.0.4/ammonite-interp_2.13.1-2.0.4.jar
…
Upon startup, this JAR ensures all these URLs are available in the coursier cache, and simply loads them from there.
In coursier, these JARs are generated by the coursier-launcher library, that also
allows to mix nested JARs and URLs, so that JARs from public dependencies can be embedded as URLs, while others can be nested.
For Ammonite, I'd propose to use the coursier-launcher library too. By default, JARs from public URLs can be embedded as URLs, while the JAR containing the byte code resulting from compiling the script itself can be nested. The resulting JARs would have a minimal size, making them fast to generate and handy to move around.
Optionally, --standalone and --assembly options could be supported by Ammonite, to nest all JARs or generate an assembly.
Ammonite API uses
Scripts may use the interpreter API, like interp.load.ivy("org" %% "name" % "ver"), to interact with Ammonite itself. Once the script is packaged as a JAR, the Ammonite runtime isn't there anymore to handle such calls.
I'd propose these calls either to throw or have no effect, and to discourage their use, just like for BSP support.
Prior to actually running the script, the main class of the generated JAR can set up a dummy InterpAPI implementation at ammonite.interp.api.InterpBridge.value0 (where interp from the user code comes from).
GraalVM
Just like sbt-native-image or the coursier CLI itself, Ammonite could fetch GraalVM archives via the coursier CLI or coursier-jvm (whose dependency graph could be made thiner…), ensure native-image is installed (via gu install native-image), and generate native images via it.
Alongside that, it could also allow users to pass a GraalVM installation root directory.
One point to pay attention to is options users might want to pass to native-image (such as these). Sensible options for the classes of the standard library could be passed by default, but users should be allowed to pass their own options. Maybe these could be read from comments in the main script, or extra arguments passed on the command-line should be passed to native-image, like
$ amm script.sc --graalvm \
--enable-all-security-services # this one is for native-image
$ ./script
Scala Native
The coursier CLI can already generate Scala Native executables (this requires the JVM launcher of the coursier CLI, not the native one):
$ coursier bootstrap --native io.get-coursier::echo::1.0.4 -o echo
$ ./echo foo
foo
The upcoming scala 2.12 support in Scala Native makes it possible to bring that feature in Ammonite, on top of the packaging capabilities above.
The code required to call Scala Native is minimal. Being able to call multiple versions of Scala Native adds a bit of complexity (currently, coursier publishes one module per Scala Native versions, such as 0.3.0 and 0.4.0-M2, and fetches either one prior to generating an executable).
For that to work, all the JARs of the classpath needed to run the script need to be cross-compiled for Scala Native. Ideally, assuming the script is run with --thin, only the dependencies of com.lihaoyi:::ammonite-interp-api need to be cross-compiled. These are:
$ cs resolve com.lihaoyi:ammonite-interp-api_2.13.3:2.2.0
com.lihaoyi:ammonite-interp-api_2.13.3:2.2.0:default
com.lihaoyi:ammonite-ops_2.13:2.2.0:default
com.lihaoyi:ammonite-util_2.13:2.2.0:default
com.lihaoyi:fansi_2.13:0.2.9:default
com.lihaoyi:geny_2.13:0.6.2:default
com.lihaoyi:os-lib_2.13:0.7.1:default
com.lihaoyi:pprint_2.13:0.5.9:default
com.lihaoyi:sourcecode_2.13:0.2.1:default
io.get-coursier:interface:0.0.21:default
net.java.dev.jna:jna:5.3.1:default
org.jline:jline:3.15.0:default
org.scala-lang:scala-compiler:2.13.3:default
org.scala-lang:scala-library:2.13.3:default
org.scala-lang:scala-reflect:2.13.3:default
org.scala-lang.modules:scala-collection-compat_2.13:2.1.2:default
In practice, if the user code doesn't reference non-cross-compiled libraries, we should be able to build a valid Scala Native executable for it. To make that work better, we can try either:
- resuming the work of https://github.com/lihaoyi/Ammonite/pull/941, to strip more user-facing libraries when
--thinis passed, or - just pass less libraries to scalac when compiling scripts for Scala Native (so that Scala Native scripts would be compiled slightly differently than non Scala Native ones)
I think is a very useful feature, especially the thin binaries generated by coursier bootstrap. This will simplify distributing ammonite scripts to the users.
I'd say the first thing we should implement are assemblies, but we should do so in a way that leaves the door open for further backends in future: coursier-bootstraps, scala-native binaries, scala.js binaries (why not?), an unpacked folder-full-of-classfiles, etc.
I imagine that most of the work necessary to get pre-compilation would be shared regardless of how we end up packaging the final output. Assemblies for all their downsides are the dumbest and most broadly familiar of any of the above options so they should definitely come first, but that doesn't mean we can't provide alternatives once we get the dumb-straightforward thing working
I'd say the first thing we should implement are assemblies, but we should do so in a way that leaves the door open for further backends in future: coursier-bootstraps, scala-native binaries, scala.js binaries (why not?), an unpacked folder-full-of-classfiles, etc.
Sure, why not. Scala.JS output would be nice too, yes!
I imagine that most of the work necessary to get pre-compilation would be shared regardless of how we end up packaging the final output.
Indeed, I think we'll mainly need a module with a main class, able to create and set up bridge implementations, then load the script entrypoint class and call its $main method. For that module to also work from Scala Native and Scala.JS, we should probably use portable-scala-reflect.
For Scala Native and Scala.JS, narrowing the scope of what needs to be cross-compiled might need a bit of work too (see the very last point I mentioned in my original comment)
Duplicate of #919
I'm glad to see the idea is finally getting interest ;-)
Duplicate of #919
I think that's what I was referring but couldn't recall to in the OP. (I had no recollection of mentions of native stuff though.)
Ideally, there should be a shebang that can be added to Scala script files that automatically compiles them to native executables, caches this binary, and runs it. I don’t know how much more expensive generating a native binary is though. I imagine Scala Native should be fast, and GraalVM slow?
it's very useful to have this feature, much easier to deploy
I think is a very useful feature, any updates? any workaround with example of *.sc to *.jar?