antlr4 icon indicating copy to clipboard operation
antlr4 copied to clipboard

[Performance] Constants for Python runtime should be folded

Open KvanTTT opened this issue 2 years ago • 2 comments

The problem

Consider the following grammar:

grammar Test;
e: T1 | T2 | T3;
T1: 't1';
T2: 't2';
T3: 't3';

For Python runtime the following code is generated for rule e:

def e(self):
    localctx = TestParser.EContext(self, self._ctx, self.state)
    self.enterRule(localctx, 0, self.RULE_e)
    self._la = 0 # Token type
    try:
        self.enterOuterAlt(localctx, 1)
        self.state = 0
        _la = self._input.LA(1)
        if not((((_la) & ~0x3f) == 0 and ((1 << _la) & ((1 << TestParser.T1) | (1 << TestParser.T2) | (1 << TestParser.T3))) != 0)):
            self._errHandler.recoverInline(self)
        else:
            self._errHandler.reportMatch(self)
            self.consume()
    except RecognitionException as re:
        localctx.exception = re
        self._errHandler.reportError(self, re)
        self._errHandler.recover(self, re)
    finally:
        self.exitRule()
    return localctx

The following constant expression is not folded during interpretation:

(1 << TestParser.T1) | (1 << TestParser.T2) | (1 << TestParser.T3)

In Java bytecode it's just 14. In this case these constant should be folded during generation.

How to check it?

I've created a benchmark: https://gist.github.com/KvanTTT/e3b355f7e321fe7f52e11ea1aa0ecbce#file-python_constant_folding_benchmark-py

It returns the following timings (python 3.8.5):

no_folding_const_refs_test: 1769 ns
no_folding_const_literals_test: 81 ns
folding: 84 ns

for the following methods:

def no_folding_const_refs_test():
    return (1 << C1) | (1 << C2) | (1 << C3) | (1 << C4) | (1 << C5) | (1 << C6) | \
        (1 << C7) | (1 << C8) | (1 << C9) | (1 << C10) | (1 << C11) | (1 << C12) | \
        (1 << C13) | (1 << C14) | (1 << C15) | (1 << C16)


def no_folding_const_literals_test():
    return (1 << 1) | (1 << 2) | (1 << 3) | (1 << 4) | (1 << 5) | (1 << 6) | \
        (1 << 7) | (1 << 8) | (1 << 9) | (1 << 10) | (1 << 11) | (1 << 12) | \
        (1 << 13) | (1 << 14) | (1 << 15) | (1 << 16)


def folding_test():
    return 0xFFFF

no_folding_const_refs_test works ~20 times slower than optimized versions. I vote for folding_test because it's the shortest code and it provides more shorter generated code.

Probably some other runtimes also don't perform constant folding (I know about Java and C#, they perform constant folding).

KvanTTT avatar May 07 '22 13:05 KvanTTT

Created a patch script that implements this optimization: https://gist.github.com/amykyta3/8285559e95074c4431c2836d78b36530

amykyta3 avatar May 08 '22 19:05 amykyta3

Thanks, but it should be (if it will) implemented via ANTLR tool like https://github.com/antlr/antlr4/pull/3701

KvanTTT avatar May 08 '22 20:05 KvanTTT