antlr4 icon indicating copy to clipboard operation
antlr4 copied to clipboard

Concurrent Parsing Bottlenecks

Open gchallen opened this issue 2 years ago • 3 comments
trafficstars

I'm running into a similar slowdown to the one reported a few years ago here: https://github.com/antlr/antlr4/issues/2454.

Specifically, I'm using ANTLR4 for a variety of syntax analysis tasks, in a batch-processing tool that is trying to leverage a high core-count machine to improve performance. Unfortunately, when I increase the number of cores, I see that both the shared DFA cache and the prediction context cache can wind up as bottlenecks, and frequently idle N - 1 available cores, which a stack dump shows are all stuck waiting for access to shared maps. addDFAState is a particularly problematic spot.

So at present I'm essentially passing individual copies of each of these caches to each newly-created parser, as so:

fun Parser.makeThreadSafe() {
    interpreter = ParserATNSimulator(
        this,
        this.atn,
        interpreter.decisionToDFA.mapIndexed { i, _ -> DFA(atn.getDecisionState(i), i) }.toTypedArray(),
        PredictionContextCache()
    )
}

This is particularly problematic when using grammars that have poor performance, such as the official Kotlin G4 grammar, which can initially take multiple seconds to process even small inputs. Normally utilizing the caches mentioned above results in an order-of-magnitude speedup once the grammar gets warm. However, keeping all cores active requires disabling those caches, which is currently resulting in a roughly 3x slowdown at least on some inputs according to my testing. This is on a data processing pipeline that runs for days, so that 3x does end up hurting a bit.

So I'm in a pickle. If I was optimizing for primarily single-core performance I'd use the caches and live with the (infrequent) bottlenecks. But for a concurrent workload I'm stuck with either a 1 fast(er) parser or N slow(er) parsers.

I took a look at the optimized ANTLR4 fork, which apparently has lock-free maintenance of these shared structures. However, the official optimized fork is still on v4.9, and the most recently-maintained fork of the fork is only up to v4.11, and both these versions cause compatibility problems with other tools that are part of our library, specifically checkstyle which is expecting a different ATN serialization version.

I've had a few other ideas about how to address this but I'm not sure whether they'd work properly. One would be to maintain N copies of the caches and let each thread check out one that's not in use. This would cause them to warm up more slowly, but that might be an OK tradeoff, since at least for the Kotlin grammar the performance improves quite quickly. Unfortunately this is slightly complicated by the fact that we use multiple different parsers in the project, and I assume that the decisionToDFA array is parser-specific. It might work, but I'd be happier if someone has tried something like this and can report that it can in fact be done.

Another option would be to allow each parser to grab its own copy of the caches on startup and then let them update them atomically once they're done. This would also cause the caches to warm more slowly, but at least each thread would get the benefits of some of the prior parsing passes. The problem here is that I'm unsure how to copy either shared structure, and they don't seem to be set up to do this even in memory. (I'm not worried about saving state across restarts.)

You could also run a few test inputs during warmup to populate the prediction caches and then at some point just start using copies and discarding future updates. But this would also require being able to copy these structures.

Any help or guidance would be appreciated! Thanks in advance. ANTLR is a great project, and if anything these bottlenecks are a sign of how often we're using it throughout our codebase...

gchallen avatar Jul 02 '23 20:07 gchallen