Ammonite icon indicating copy to clipboard operation
Ammonite copied to clipboard

Discussion - JARs and native executables from scripts

Open alexarchambault opened this issue 5 years ago • 9 comments

Opening this to know if there's some interest and discuss details of a possible implementation.

Basically, what I'd like to discuss is adding the possibility to do things like

$ amm script.sc --jar
$ ./script.jar

That makes it easier to copy scripts in one's bin directory (without having to copy the script itself and all the scripts it imports), or allows to generate them once (with the right Ammonite version) and then forget about the exact invocation to generate it.

(I remember seeing such a feature being hinted at some time ago, maybe by @lihaoyi, but can't find out where…)

On top of that, we could add the possibility to convert these JARs to native executables, via GraalVM and / or scala-native, for faster start-up times:

$ amm script.sc --native-image
$ ./script
$ amm script.sc --scala-native
$ ./script

What kind of JAR should we generate?

Even though assemblies seem like a natural choice, I'd argue coursier bootstraps make a better candidate.

Assemblies

Assemblies (a.k.a. "fat JARs" or "uber JARs") are quite common, when one want to package JVM applications in a single JAR. These basically take all the JARs of the classpath, and merge their content. Yet they suffer some drawbacks, most notably:

  • some collisions may arise when merging JARs content
  • it's hard to make sense of what's in an assembly once it's packaged: we can't really know which dependencies are in it, what are their versions, etc.

Nested JARs

To circumvent these shortcomings, Spring Boot, but also coursier, allow to nest JARs, so that these don't have to be merged:

$ cs bootstrap ammonite:2.0.4 --standalone
$ unzip -l amm
…
   146682  01-14-2020 05:24   coursier/bootstrap/launcher/jars/ammonite_2.13.1-2.0.4.jar
   178031  01-14-2020 05:24   coursier/bootstrap/launcher/jars/ammonite-terminal_2.13-2.0.4.jar
   131203  01-14-2020 05:24   coursier/bootstrap/launcher/jars/ammonite-ops_2.13-2.0.4.jar
   120047  01-14-2020 05:24   coursier/bootstrap/launcher/jars/ammonite-util_2.13-2.0.4.jar
   206114  01-14-2020 05:24   coursier/bootstrap/launcher/jars/ammonite-runtime_2.13.1-2.0.4.jar
…

coursier bootstraps

One shortcoming of both assemblies and nesting JARs is the size of the resulting JAR:

$ ls -lh amm
-rwxr-xr-x  1 alex  staff    35M oct 28 14:32 amm

To make such JARs smaller, coursier uses the fact that most of these nested JARs come straight from public repositories, such as Maven Central. Instead of embedding JARs, it allows to embed the URLs of such JARs, like https://repo1.maven.org/maven2/com/lihaoyi/ammonite-repl_2.13.1/2.0.4/ammonite-repl_2.13.1-2.0.4.jar:

$ rm -f amm
$ cs bootstrap ammonite:2.0.4
$ ls -lh amm
-rwxr-xr-x  1 alex  staff    31K oct 28 14:32 amm
$ unzip -p amm coursier/bootstrap/launcher/bootstrap-jar-urls
…
https://repo1.maven.org/maven2/com/lihaoyi/ammonite-interp-api_2.13.1/2.0.4/ammonite-interp-api_2.13.1-2.0.4-sources.jar
https://repo1.maven.org/maven2/com/lihaoyi/ammonite-interp-api_2.13.1/2.0.4/ammonite-interp-api_2.13.1-2.0.4.jar
https://repo1.maven.org/maven2/com/lihaoyi/ammonite-interp_2.13.1/2.0.4/ammonite-interp_2.13.1-2.0.4-sources.jar
https://repo1.maven.org/maven2/com/lihaoyi/ammonite-interp_2.13.1/2.0.4/ammonite-interp_2.13.1-2.0.4.jar
…

Upon startup, this JAR ensures all these URLs are available in the coursier cache, and simply loads them from there.

In coursier, these JARs are generated by the coursier-launcher library, that also allows to mix nested JARs and URLs, so that JARs from public dependencies can be embedded as URLs, while others can be nested.

For Ammonite, I'd propose to use the coursier-launcher library too. By default, JARs from public URLs can be embedded as URLs, while the JAR containing the byte code resulting from compiling the script itself can be nested. The resulting JARs would have a minimal size, making them fast to generate and handy to move around.

Optionally, --standalone and --assembly options could be supported by Ammonite, to nest all JARs or generate an assembly.

Ammonite API uses

Scripts may use the interpreter API, like interp.load.ivy("org" %% "name" % "ver"), to interact with Ammonite itself. Once the script is packaged as a JAR, the Ammonite runtime isn't there anymore to handle such calls.

I'd propose these calls either to throw or have no effect, and to discourage their use, just like for BSP support.

Prior to actually running the script, the main class of the generated JAR can set up a dummy InterpAPI implementation at ammonite.interp.api.InterpBridge.value0 (where interp from the user code comes from).

GraalVM

Just like sbt-native-image or the coursier CLI itself, Ammonite could fetch GraalVM archives via the coursier CLI or coursier-jvm (whose dependency graph could be made thiner…), ensure native-image is installed (via gu install native-image), and generate native images via it.

Alongside that, it could also allow users to pass a GraalVM installation root directory.

One point to pay attention to is options users might want to pass to native-image (such as these). Sensible options for the classes of the standard library could be passed by default, but users should be allowed to pass their own options. Maybe these could be read from comments in the main script, or extra arguments passed on the command-line should be passed to native-image, like

$ amm script.sc --graalvm \
    --enable-all-security-services # this one is for native-image
$ ./script

Scala Native

The coursier CLI can already generate Scala Native executables (this requires the JVM launcher of the coursier CLI, not the native one):

$ coursier bootstrap --native io.get-coursier::echo::1.0.4 -o echo
$ ./echo foo
foo

The upcoming scala 2.12 support in Scala Native makes it possible to bring that feature in Ammonite, on top of the packaging capabilities above.

The code required to call Scala Native is minimal. Being able to call multiple versions of Scala Native adds a bit of complexity (currently, coursier publishes one module per Scala Native versions, such as 0.3.0 and 0.4.0-M2, and fetches either one prior to generating an executable).

For that to work, all the JARs of the classpath needed to run the script need to be cross-compiled for Scala Native. Ideally, assuming the script is run with --thin, only the dependencies of com.lihaoyi:::ammonite-interp-api need to be cross-compiled. These are:

$ cs resolve com.lihaoyi:ammonite-interp-api_2.13.3:2.2.0
com.lihaoyi:ammonite-interp-api_2.13.3:2.2.0:default
com.lihaoyi:ammonite-ops_2.13:2.2.0:default
com.lihaoyi:ammonite-util_2.13:2.2.0:default
com.lihaoyi:fansi_2.13:0.2.9:default
com.lihaoyi:geny_2.13:0.6.2:default
com.lihaoyi:os-lib_2.13:0.7.1:default
com.lihaoyi:pprint_2.13:0.5.9:default
com.lihaoyi:sourcecode_2.13:0.2.1:default
io.get-coursier:interface:0.0.21:default
net.java.dev.jna:jna:5.3.1:default
org.jline:jline:3.15.0:default
org.scala-lang:scala-compiler:2.13.3:default
org.scala-lang:scala-library:2.13.3:default
org.scala-lang:scala-reflect:2.13.3:default
org.scala-lang.modules:scala-collection-compat_2.13:2.1.2:default

In practice, if the user code doesn't reference non-cross-compiled libraries, we should be able to build a valid Scala Native executable for it. To make that work better, we can try either:

  • resuming the work of https://github.com/lihaoyi/Ammonite/pull/941, to strip more user-facing libraries when --thin is passed, or
  • just pass less libraries to scalac when compiling scripts for Scala Native (so that Scala Native scripts would be compiled slightly differently than non Scala Native ones)

alexarchambault avatar Oct 29 '20 11:10 alexarchambault

I think is a very useful feature, especially the thin binaries generated by coursier bootstrap. This will simplify distributing ammonite scripts to the users.

mushtaq avatar Oct 29 '20 13:10 mushtaq

I'd say the first thing we should implement are assemblies, but we should do so in a way that leaves the door open for further backends in future: coursier-bootstraps, scala-native binaries, scala.js binaries (why not?), an unpacked folder-full-of-classfiles, etc.

I imagine that most of the work necessary to get pre-compilation would be shared regardless of how we end up packaging the final output. Assemblies for all their downsides are the dumbest and most broadly familiar of any of the above options so they should definitely come first, but that doesn't mean we can't provide alternatives once we get the dumb-straightforward thing working

lihaoyi avatar Oct 29 '20 14:10 lihaoyi

I'd say the first thing we should implement are assemblies, but we should do so in a way that leaves the door open for further backends in future: coursier-bootstraps, scala-native binaries, scala.js binaries (why not?), an unpacked folder-full-of-classfiles, etc.

Sure, why not. Scala.JS output would be nice too, yes!

I imagine that most of the work necessary to get pre-compilation would be shared regardless of how we end up packaging the final output.

Indeed, I think we'll mainly need a module with a main class, able to create and set up bridge implementations, then load the script entrypoint class and call its $main method. For that module to also work from Scala Native and Scala.JS, we should probably use portable-scala-reflect.

alexarchambault avatar Oct 30 '20 18:10 alexarchambault

For Scala Native and Scala.JS, narrowing the scope of what needs to be cross-compiled might need a bit of work too (see the very last point I mentioned in my original comment)

alexarchambault avatar Oct 30 '20 18:10 alexarchambault

Duplicate of #919

I'm glad to see the idea is finally getting interest ;-)

david-bouyssie avatar Nov 04 '20 16:11 david-bouyssie

Duplicate of #919

I think that's what I was referring but couldn't recall to in the OP. (I had no recollection of mentions of native stuff though.)

alexarchambault avatar Nov 05 '20 08:11 alexarchambault

Ideally, there should be a shebang that can be added to Scala script files that automatically compiles them to native executables, caches this binary, and runs it. I don’t know how much more expensive generating a native binary is though. I imagine Scala Native should be fast, and GraalVM slow?

NightMachinery avatar May 14 '21 13:05 NightMachinery

it's very useful to have this feature, much easier to deploy

thinkiny avatar Aug 03 '21 03:08 thinkiny

I think is a very useful feature, any updates? any workaround with example of *.sc to *.jar?

mvillafuertem avatar Oct 10 '21 19:10 mvillafuertem