jinja poor performance when parsing unclosed string with many escape characters

The lexer has poor performance on unclosed strings with many escape characters. It seems to be due to the string_re regular expression. It should probably be possible to make this run in linear or near-linear time instead by adjusting the regex or lexer.

import time


def timer(func):
    def wrapper(*args, **kwargs):
        start_time = time.time()
        result = func(*args, **kwargs)
        end_time = time.time()
        elapsed_time = end_time - start_time
        print(f"Function '{func.__name__}' executed in {elapsed_time:.4f}s")
        return result

    return wrapper


def sizer(func):
    def wrapper(*args, **kwargs):
        result = func(*args, **kwargs)
        print(f"Payload length: {args[0]}\nPayload size: {len(result.encode()) / (1024 ** 2)} MB")
        return result

    return wrapper


from jinja2 import Environment
from jinja2.lexer import get_lexer


@sizer
def create_payload(char_length: int):
    # The slow pattern: r"('([^'\\]*(?:\\.[^'\\]*)*)'" r'|"([^"\\]*(?:\\.[^"\\]*)*)")'
    # This is slow on a string that starts with a quote, has many escape sequences,
    # and ends with characters that will cause backtracking
    payload = "'" + ("\\a" + "b" * char_length) * char_length + "c"
    return payload


@timer
def runner(char_length: int):
    try:
        # Create the payload
        payload = create_payload(char_length)

        # Get the lexer from jinja2
        env = Environment()
        lexer = get_lexer(env)

        # Tokenize the payload - this is slow
        tokens = list(lexer.tokenize(payload))

        return tokens
    except Exception as e:
        print(f"Exception occurred: {e}")
        return None


if __name__ == '__main__':
    runner(100)
    runner(1000)
    runner(2000)
    runner(4000)
    runner(8000)
    runne(10000)

Environment:

Python version: 3.13.5
Jinja version: 3.1.6

Jul 22 '25 17:07 wadesparks

Especially in shared environments, proper rlimits, timeouts, etc. should be set.

Also, unless you use the sandboxed environment - templates must never come from untrusted sources anyway (otherwise you can do RCE via malicious templates).

In any case, I think we'd be open to a PR to fix this.

Jul 22 '25 17:07 ThiefMaster

I've updated this to directly talk about the performance issue rather than attacks and DoS, as described in our security policy.

Jul 22 '25 19:07 davidism

Happy to review a PR, or just suggestions. Perhaps it needs a small change, or split into multiple expressions, or something else? I'm not very good an analyzing these, so if you have a suggestion since you noticed the issue in the first place, that would be helpful.

Jul 22 '25 19:07 davidism

Thanks for adjusting my original submission guys, did not intend to report improperly. This report came from a researcher and I am conducting coordinated vulnerability disclosure (CVD), so I am rather limited on technical expertise outside of what was submitted. It's also difficult in that we can not tag in the reporter to contribute as that would defeat the purpose of our CVD policy.

Jul 23 '25 15:07 wadesparks

Were you all able to reproduce the issue? We are interested in getting a CVE assigned to this issue.

Aug 26 '25 15:08 wadesparks

There are a limited number of maintainers with a limited amount of time. If you'd like to point the reporter here, since they seem to be the one with the expertise in regex performance and could offer advice, or we'd be happy to review a PR from them. Otherwise, we'll create one in our own time, or when another contributor is interested.

We don't plan to issue a CVE for this. As outlined in our security policy, regex performance is considered a normal performance issue, not a security issue.

Aug 26 '25 16:08 davidism

I would be interested in working on this issue. Would you like to assign it to me, or do I simply prepare a PR and send it over when it is ready?

Aug 30 '25 21:08 aledelaoo