rules_kotlin icon indicating copy to clipboard operation
rules_kotlin copied to clipboard

Worker multiplex corruption

Open restingbull opened this issue 2 years ago • 10 comments

Following up on #531, the coroutine implementation of multiplexing requests suffers from corruption issues.

Currently, switching to a pure java implementation, but the source of the corruption should be understood.

restingbull avatar Jan 06 '22 17:01 restingbull

I might be off-base here, not being familiar with the implementation. But I implemented the multi-threaded persistent Kotlin compiler usage in Buck a few years back. Are you guys are using a single-instance of the Kotlin compiler class (as Buck did)? If so, I could definitely see how a coroutine-based implementation could go wrong. The Kotlin compiler class supports multithreaded compilation, with appropriate isolation between threads, but this is likely to go off the rails with coroutines which by their nature execute from a pool of threads. Because the Kotlin compiler uses things like ThreadLocals to hold state (as far as I remember), re-using threads as coroutines do is almost guaranteed to bleed state between compilations.

Just my thoughts at first glance.

EDIT: Looking at the code in Buck (it's been a while since I've seen it), it seems we actually instantiate an instance of the compiler for each compilation. But this is still as big win, as the classes remain loaded and are JIT compiled after a number of compilations:

https://github.com/facebook/buck/blob/e5708e9b6d345c6654de98af529792e5f397ffe0/src/com/facebook/buck/jvm/kotlin/JarBackedReflectedKotlinc.java#L176

In fact, in terms of speed, my clean Buck build takes about 1m2s, whereas my Bazel build is running at ~3m. I haven't yet profiled to see where the costs are but I suspect the Kotlin compiles are taking longer by comparison.

brettwooldridge avatar Mar 08 '22 13:03 brettwooldridge

Thanks @brettwooldridge, see https://github.com/bazelbuild/rules_kotlin/pull/650 where @restingbull is trying to repro and fix this issue. FWIW in the above PR, we're seeing the same corruption without Coroutines but maybe something is still missing.

nkoroste avatar Mar 08 '22 16:03 nkoroste

@nkoroste @restingbull I have a few questions:

  • Is my reading of the code correct in that the .kt and .java sources are partitioned and compiled separately, rather than throwing both into the K2JVMCompiler?
  • If so, is there a reason this is done?

I ask, because in the Buck implementation of the kotlin_library rule (for example), we just throw both sources into the K2JVMCompiler.exec() method. It seemed to be substantially faster that way, and resolves "circular" dependencies between the two sets of source files, while avoiding have to merge jars afterwards.

brettwooldridge avatar Mar 08 '22 20:03 brettwooldridge

@nkoroste @restingbull Regarding the presence of ThreadLocals in the K2JVMCompiler, I believe that the ThreadLocals are bound to the instance, rather than static, which is probably why we wound up instantiating a K2JVMCompiler for each thread invoking the compiler ... rather than a singleton.

EDIT: The cost of instantiating a K2JVMCompiler is basically zero.

brettwooldridge avatar Mar 08 '22 20:03 brettwooldridge

Is my reading of the code correct in that the .kt and .java sources are partitioned and compiled separately, rather than throwing both into the K2JVMCompiler?

The java compilation will disappear entirely in the very near future -- It was being compiled separately when I started maintaining the project, so I can't tell you it there was a compelling reason.

Regarding the presence of ThreadLocals in the K2JVMCompiler, I believe that the ThreadLocals are bound to the instance, rather than static, which is probably why we wound up instantiating a K2JVMCompiler for each thread invoking the compiler ... rather than a singleton.

It was a singleton when I got there... I'll try switching it up and see what happens.

restingbull avatar Mar 09 '22 17:03 restingbull

@restingbull is this fixed now that https://github.com/bazelbuild/rules_kotlin/pull/703 is merged?

rockwotj avatar Apr 02 '22 02:04 rockwotj

No, unfortunately not.

You can use --worker_quit_after_build to workaround this issue however this will negatively impact your incremental build performance.

nkoroste avatar Apr 03 '22 18:04 nkoroste

Got a reply from Kotlin team in some offline conversation, I'm going to add it here for context:

We're using multithreaded Kotlin compilation quite extensively, particularly via Kotlin compiler daemon, and we've been unaware of corruption or reliability problems for quite some years already. The key is to use the simple K2JVMCompiler.exec entry point and set kotlin.environment.keepalive system property to "true". According to our experience, in this setting, the Kotlin compiler daemon survives quite a high parallel load in the CI usage, provided that the process is given enough memory. With this usage, the compiler cares about sharing and isolation itself. Mostly about isolation since the primary data that is being shared is the jar cache. There might be ways to do it more effectively, but it works for us for now. If you're interested in some implementation details, you can check the daemon code starting from org.jetbrains.kotlin.daemon.CompileServiceImplBase.compileImpl

nkoroste avatar Oct 12 '23 15:10 nkoroste

Some of the issue maybe with the cache and bazel. Last I looked some caches used file modification time to determine cache behavior which bazel deterministically sets those to the same value as far as I remember. So its likely something unique to bazel in terms of how the cache works or that everything is a symlink

rockwotj avatar Oct 12 '23 17:10 rockwotj

Was looking through Google's Kotlin bazel implementation and came across this which potentially can help?

    private static void clearJarCache() {
        // This is adapted from CompileServiceImpl.clearJarCache in the Kotlin compiler daemon.
        // It is needed because CoreJarFileSystem stores a global cache of jar file central
        // directories---and this cache can become stale when it is used across multiple
        // compilations. One would normally expect this sort of cleanup to be handled
        // by K2JVMCompiler directly through the project disposable---but this is not the case.
        // Instead, persistent workers must clear the caches manually. For example, in Gradle, you
        // can see that GradleKotlinCompilerWork.compileWithDaemon invokes daemon.clearJarCache
        // after each compilation job. Side note: it would make more sense (and improve performance)
        // to only clear these caches after *all* jobs from a given build session have finished.
        // However, this does not seem feasible for Bazel persistent workers. And anyway this
        // optimization is not implemented in Gradle either (there is just a to-do comment).
        ZipHandler.clearFileAccessorCache();
        KotlinCoreApplicationEnvironment appEnv =
                KotlinCoreEnvironment.Companion.getApplicationEnvironment();
        if (appEnv != null) {
            VirtualFileSystem jarFileSystem = appEnv.getJarFileSystem();
            if (jarFileSystem instanceof CoreJarFileSystem) {
                ((CoreJarFileSystem) jarFileSystem).clearHandlersCache();
            }
        }
    }

arunkumar9t2 avatar Feb 12 '24 18:02 arunkumar9t2