bloop
bloop copied to clipboard
Fix Scala.js linker to be incremental
The Zinc back-end currently does not expose a list of class files that changed during the last compilation run. Therefore, every time run and link is executed on a Scala.js or Scala Native project, the target file is linked anew.
The underlying reason for this is that the sbt plugin for Scala.js and Native have an extra level of caching in the build tool that checks that if no changes in the source files has happened it's not worth running the linker again. For Scala.js, this happens here:
https://github.com/scala-js/scala-js/blob/88ed21bff68b7baddbd2ab93b7ed6c026908d957/sbt-plugin/src/main/scala/org/scalajs/sbtplugin/ScalaJSPluginInternal.scala#L183-L209
I don't understand the reason why this is required in the Scala.js sbt plugin. @sjrd In bloop we don't have source or resource generators, and the compile function doesn't run if source files haven't changed, is it possible that the linked logic to cache in the build tool is hiding a bug in the Scala.js linker?
Linker.link will link and produce a new .js file every time you call it. That's its contract. It will probably do stuff internally to maintain an incremental state (if you're reusing the same instance of Linker), but in all cases it will produce a new .js file.
If the set of inputs hasn't changed at all, it is the responsibility of the code calling Linker.link to avoid calling it in the first place.
the compile function doesn't run if source files haven't changed
That must come from some logic in bloop, doesn't it? You should extend that logic not to call Linker.link in that case.
Thanks for explaining the contract of link.
That must come from some logic in bloop, doesn't it? You should extend that logic not to call Linker.link in that case.
The logic for this is in Zinc, so it's not easily usable. I'll rework how it works so that we can also re-use the change detection in the sources across the different use cases in Bloop.
@tindzk As commented in Gitter, I'll take on this in my branch of pipelined compilation that gives us direct access to Zinc internal APIs.
FileTracker could be used to check whether the sjsir files have changed: https://github.com/scalacenter/bloop/blob/master/frontend/src/main/scala/bloop/io/FileTracker.scala
I lean towards the implementation of an in-memory index of changed files. An index is required because this information is required by several tasks in our task pipeline and currently each of those tasks compute change detection in its own way. For example, Zinc uses hashes to detect changes in sources, the file watcher does the same to deduplicate file watch events, this task requires also the change detection of sources, etc.
FileTracker doesn't fit well into this picture, so that why I haven't suggested it from the beginning.
Therefore, every time run and link is executed on a Scala.js or Scala Native project, the target file is linked anew.
@tindzk Do you mean by this that Scala.js takes n seconds to link and then generate a new target file or is Scala.js creating a new file but doing so in a quick way (because it's re-using the results from the previous incremental result)?
If it's the second one, we'll need to wait until I implement the index. If it isn't, then it means that our use of Scala.js APIs is incorrect and we're not linking incrementally. @sjrd mentioned that the contract is that it creates a new link file, but we're not that bothered if it creates a new link file fast because the IR files didn't change.
Here you are recreating a new instance of Linker every time you call link, so you're definitely not linking incrementally.
@jvican If no files have changed in a project, linking it with Bloop only takes 1-2s, but this overhead becomes noticeable when linking multiple unchanged projects.
@sjrd Thanks for pointing that out! I will fix it.
So let's fix incremental linking in Scala.js, since that's the bottleneck here. In the future, when I implement this index, we'll avoid the call to the linker.
I initially added this to the v1.0.0 milestone, but now that I think more about it I'll delay work on this area to v1.0.1 or v1.1.0, which should happen relatively soon after 1.0.0. I don't have time to work on a serious fix for this, which will require some changes to use ClearableLinker and keep the linker state in Scala.js controlled. We also need to make sure that the linker is thread-safe (ClearableLinker isn't, but I hope StandardLinker is -- otherwise we'll have some trouble).
Linkers are not thread-safe. They have a fail-safe to throw an exception early if you try to use them concurrently.
You should never need to call a Linker concurrently from two threads, since you should have a separate Linker for every classpath that you intend to link. In sbt terms, that means one instance of Linker per project x configuration (if you have 3 projects each with Compile and Test, you have 6 instances of Linker). Since there's no reason to link twice the exact same classpath concurrently, Linkers are not thread-safe. It is of course safe to use two different Linkers concurrently.
The issue here is that Bloop, as a compile server, handles concurrent compilation requests for the same project. This use case is common in the setting of remote compilation, and therefore I'd like to use an incremental linker to process two different links of the same project at once (note that the same project may have the same classpath but different sources).
Can you clone a linker at a given state? That would solve our thread safety issues. If we had a way to clone the linker, then we would make sure that when we get a concurrent request we "fork" the linker state, and at the end the linker that comes last wins and gets persisted to be used in subsequent requests.
There is no way to clone a linker.
You will have to track for which projects you are already linking. Then when a request to link a project/classpath that is already being linked arrives, you have two options:
- wait for the existing linking to finish before reusing the
Linker. The second run will benefit from the work done by the concurrently runninglink()invocation, since it's going to be able to incrementally reuse what it did. - instantiate a new
Linkeron the fly. This kills the incremental state but linking can start immediately.
I recommend the first one.