Limit parallel Scala.js linking jobs to avoid high memory pressure
Fix: https://github.com/com-lihaoyi/mill/issues/6226
I just hardcoded the limit to 2 for now.
Pull request: https://github.com/com-lihaoyi/mill/pull/6260
I'm just witnessing a fatal OOM in coursier release process: https://github.com/coursier/coursier/actions/runs/19949228402/job/57231194898
[3549] core.js[2.12.20].resolvedMvnDeps 658s
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid2482.hprof ...
Heap dump file created [6755211786 bytes in 76.676 secs]
Exception in thread "Process ID Checker Thread" java.lang.OutOfMemoryError: Java heap space
at java.base/jdk.internal.misc.Unsafe.allocateInstance(Native Method)
at java.base/java.lang.invoke.DirectMethodHandle.allocateInstance(DirectMethodHandle.java:501)
at java.base/java.lang.invoke.DirectMethodHandle$Holder.newInvokeSpecial(DirectMethodHandle$Holder)
at java.base/java.lang.invoke.Invokers$Holder.linkToTargetMethod(Invokers$Holder)
at mill.server.Server$.checkProcessIdFile(Server.scala:484)
at mill.server.Server$.$anonfun$7(Server.scala:511)
at mill.server.Server$$$Lambda/0x00007f12b00ffd38.run(Unknown Source)
at java.base/java.lang.Thread.runWith(Thread.java:1596)
at java.base/java.lang.Thread.run(Thread.java:1583)
I think I want to merge this.
On the bright side: At least it is reproducible, and we know what the problem is. 🙂
@lefou Are we sure this solves the problem? Because the problem is not about limiting the parallelism, but about limiting the memory consumption of the Scala.js linkers. Even if we limit at 2, we still have a cache that stores jobs linkers. So we could still reach the same amount of memory to be allocated, no?
@lolgab I guess that depends on the implementation. Currently I work around the problem by manually forcing a concurrency limit, in order (to my naive understanding) to avoid the system choking / memory thrashing.
https://github.com/PurpleKingdomGames/indigoengine/blob/main/ci.sh#L9-L10
TBH, i have no idea. I just try to solve a blocker issue based on the provided input.
We probably also need to limit the cached jobs to the same size. Alternative or in addition, we could try to hold the cache in a soft or weak reference, so that the garbage collector has a change to evict unused instances. (We did this before for some caches, but I'm not sure, this code is still in place, since there were many rounds of refactoring since.)
Question: What is a good default for parallel linker jobs? This PR uses '2', mostly as this was reported as a good number, but I have no idea what's reasonable.
We should also add some metrics, so we better understand the error cases.
Alternative or in addition, we could try to hold the cache in a soft or weak reference, so that the garbage collector has a change to evict unused instances. (We did this before for some caches, but I'm not sure, this code is still in place, since there were many rounds of refactoring since.)
This unfortunately doesn't work. We did it before, but it wasn't working because Scala.js needs a cleanup method to be called to clean the cache, otherwise it gets leaked. SoftReference caches can't call finalizers when they get garbage collected.
@davesmith00000 Could you by any chance check, if this PR as-is fixes your issue (without applying your other workarounds, like limiting the --jobs.)?
cc @alexarchambault for awareness.
https://github.com/com-lihaoyi/mill/pulls#issuecomment-3615994465
Regarding the linker state. I don't know what the benefits of not clearing the linker are, but we should be able to auto-clean it after each use. That hopefully means, we don't hog unneeded memory, but still keep the JIT-ed classes around.
But maybe you don't mean the linker state, but the IRFileCache.Cache. Don't know what the best to do here.
Regarding the linker state. I don't know what the benefits of not clearing the linker are, but we should be able to auto-clean it after each use. That hopefully means, we don't hog unneeded memory, but still keep the JIT-ed classes around.
This basically kills the benefits of having a worker, since the Scala.js linker becomes not incremental anymore.
I'm thinking what is the best approach to avoid OOMs while keeping good parallelism and the incremental state.
I guess we need some runtime stats, and decide based on total and/or relative memory consumption, what caches to keep and what to remove. Theoretically, there are various kind ofdata a worker can keep, but not all state might provide the same benefit of being kept. E.g. intermediate compile results can be written and read from disc, but still provide a benefit over re-computation of the whole result. In the end, a cache so large that it causes OOMs is worse than no cache at all.
A classloader cache is much cheaper while ensuring high performance due to JIT-ed bytecode, than some in-memory cache of intermediate binary results.
@lefou I thought I'd try quickly testing this during my lunch break, but my efforts have been hampered by the forced upgrade to Scala 3.8.0-RC1 that Mill requires:
- 3.8.0-RC1 seems to have some weird behaviours around unused code (sometimes it's wrong...).
- 3.8.0-RC1 has reclassified some warnings, it seems, so a lot of patching was required.
- I don't understand the relationship between Mill Scala 3 version, my plugin's Scala 3 version, and my main projects Scala 3 version. Currently if they aren't aligned, bad things happen.
- One of my module's tests now refuse to compile.
Anyway, in terms of concurrently running fastOptJS, it seems better. I can't be 100% sure until I fix point (4) above, but it was happily linking 8-10 modules concurrently at one point.
Thank you @davesmith00000! I assume, before you were not able to have 8-10 link-tasks in parallel.
I'll merge this PR in the hope it helps. At least, it shouldn't make things worse. We can address the ScalaJS worker cache in a separate PR.
I've been trying to wrap my head around the ScalaJSWorker caching many times. It's complicated. To recap, this is what we have.
We have a first layer of caching where we have an instance of ScalaJSWorkerImpl for every different Scala.js classloader.
So, more or less, we have an entry for every different scalaJSVersion we have in the process, with a maximum of ctx.jobs.
Then we have a second layer of caching where we cache the linkers. For every ScalaJSWorkerImpl instance, we have ctx.jobs linkers for ~every entry of the module/isFullLinkJS matrix.
On top of this, the way mill.util.CachedFactory works is that if you request more entries than maxCacheSize, it allocates a new linker, links, and then disposes it right away.
What I would want is a single limit that would be somehow shared by the two caches, so we keep control on the total number of linkers we instantiate, not only on the ones that we have in one of the ScalaJSWorkerImpls that are instantiated.
Moreover, maybe that behavior of creating an instance and dropping it right away we have in mill.util.CachedFactory is part of the problem. Maybe it should keep an internal semaphore as the one you implemented for Scala.js and limit who tries to create more entries than maxCacheSize?
I assume, before you were not able to have 8-10 link-tasks in parallel.
Correct. Previously it would start 8-10 in parallel but grind to a halt.
The behaviour I observe now is that it is completing what it can complete and only getting stuck on the troublesome module, which is some unrelated issue.