poor performance when parsing unclosed string with many escape characters
The lexer has poor performance on unclosed strings with many escape characters. It seems to be due to the string_re regular expression. It should probably be possible to make this run in linear or near-linear time instead by adjusting the regex or lexer.
import time
def timer(func):
def wrapper(*args, **kwargs):
start_time = time.time()
result = func(*args, **kwargs)
end_time = time.time()
elapsed_time = end_time - start_time
print(f"Function '{func.__name__}' executed in {elapsed_time:.4f}s")
return result
return wrapper
def sizer(func):
def wrapper(*args, **kwargs):
result = func(*args, **kwargs)
print(f"Payload length: {args[0]}\nPayload size: {len(result.encode()) / (1024 ** 2)} MB")
return result
return wrapper
from jinja2 import Environment
from jinja2.lexer import get_lexer
@sizer
def create_payload(char_length: int):
# The slow pattern: r"('([^'\\]*(?:\\.[^'\\]*)*)'" r'|"([^"\\]*(?:\\.[^"\\]*)*)")'
# This is slow on a string that starts with a quote, has many escape sequences,
# and ends with characters that will cause backtracking
payload = "'" + ("\\a" + "b" * char_length) * char_length + "c"
return payload
@timer
def runner(char_length: int):
try:
# Create the payload
payload = create_payload(char_length)
# Get the lexer from jinja2
env = Environment()
lexer = get_lexer(env)
# Tokenize the payload - this is slow
tokens = list(lexer.tokenize(payload))
return tokens
except Exception as e:
print(f"Exception occurred: {e}")
return None
if __name__ == '__main__':
runner(100)
runner(1000)
runner(2000)
runner(4000)
runner(8000)
runne(10000)
Environment:
- Python version: 3.13.5
- Jinja version: 3.1.6
Especially in shared environments, proper rlimits, timeouts, etc. should be set.
Also, unless you use the sandboxed environment - templates must never come from untrusted sources anyway (otherwise you can do RCE via malicious templates).
In any case, I think we'd be open to a PR to fix this.
I've updated this to directly talk about the performance issue rather than attacks and DoS, as described in our security policy.
Happy to review a PR, or just suggestions. Perhaps it needs a small change, or split into multiple expressions, or something else? I'm not very good an analyzing these, so if you have a suggestion since you noticed the issue in the first place, that would be helpful.
Thanks for adjusting my original submission guys, did not intend to report improperly. This report came from a researcher and I am conducting coordinated vulnerability disclosure (CVD), so I am rather limited on technical expertise outside of what was submitted. It's also difficult in that we can not tag in the reporter to contribute as that would defeat the purpose of our CVD policy.
Were you all able to reproduce the issue? We are interested in getting a CVE assigned to this issue.
There are a limited number of maintainers with a limited amount of time. If you'd like to point the reporter here, since they seem to be the one with the expertise in regex performance and could offer advice, or we'd be happy to review a PR from them. Otherwise, we'll create one in our own time, or when another contributor is interested.
We don't plan to issue a CVE for this. As outlined in our security policy, regex performance is considered a normal performance issue, not a security issue.
I would be interested in working on this issue. Would you like to assign it to me, or do I simply prepare a PR and send it over when it is ready?