Fixed #35675 - avoid backtracking in Template parsing
Trac ticket number
ticket-35675
Branch description
This branch adds lookaheads to the regex pattern used by Templates to avoid the excessive backtracking described by 35675. I will post the script I used to test the performance and results in a follow-up comment.
Checklist
- [x] This PR targets the
mainbranch. - [x] The commit message is written in past tense, mentions the ticket number, and ends with a period.
- [x] I have checked the "Has patch" ticket flag in the Trac system.
- [x] I have added or updated relevant tests.
- [x] I have added or updated relevant docs, including release notes if applicable.
- [x] I have attached screenshots in both light and dark modes for any UI changes.
I used this script to test performance:
import timeit
if __name__ == "__main__":
n_exec = 100
n_nested = 2000
template_str_cases = [
('var', '{{ somevar }}'),
('tag', '{% if something %}'),
('comment', '{# somecomment #}'),
('mismatched', '{{ thesedontmatch #}'),
('var_nested_open', '{%'*n_nested),
('tag_nested_open', '{{'*n_nested),
('comment_nested_open', '{#'*n_nested),
('var_nested_closed', '{%'*n_nested + '%}'),
('tag_nested_closed', '{{'*n_nested + '}}'),
('comment_nested_closed', '{#'*n_nested + '#}'),
('mismatched_nested', '{{'*n_nested + '#}'),
]
for case_name, template_str in template_str_cases:
time = timeit.timeit("Lexer('" +template_str + "').tokenize()", setup="from django.template.base import Lexer", number=n_exec)
t_per_exec = time / n_exec
print("----------------\nCASE: {}".format(case_name))
print(f"Total time taken: {time:.6f} seconds")
print(f"Time per execution: {t_per_exec:.6f} seconds")
With the following output: No lookaheads -
----------------
CASE: var
Total time taken: 0.000072 seconds
Time per execution: 0.000001 seconds
----------------
CASE: tag
Total time taken: 0.000074 seconds
Time per execution: 0.000001 seconds
----------------
CASE: comment
Total time taken: 0.000065 seconds
Time per execution: 0.000001 seconds
----------------
CASE: mismatched
Total time taken: 0.000051 seconds
Time per execution: 0.000001 seconds
----------------
CASE: var_nested_open
Total time taken: 2.305461 seconds
Time per execution: 0.023055 seconds
----------------
CASE: tag_nested_open
Total time taken: 4.184063 seconds
Time per execution: 0.041841 seconds
----------------
CASE: comment_nested_open
Total time taken: 2.290232 seconds
Time per execution: 0.022902 seconds
----------------
CASE: var_nested_closed
Total time taken: 0.002476 seconds
Time per execution: 0.000025 seconds
----------------
CASE: tag_nested_closed
Total time taken: 0.002256 seconds
Time per execution: 0.000023 seconds
----------------
CASE: comment_nested_closed
Total time taken: 0.002438 seconds
Time per execution: 0.000024 seconds
----------------
CASE: mismatched_nested
Total time taken: 4.188701 seconds
Time per execution: 0.041887 seconds
With lookaheads -
----------------
CASE: var
Total time taken: 0.000076 seconds
Time per execution: 0.000001 seconds
----------------
CASE: tag
Total time taken: 0.000081 seconds
Time per execution: 0.000001 seconds
----------------
CASE: comment
Total time taken: 0.000070 seconds
Time per execution: 0.000001 seconds
----------------
CASE: mismatched
Total time taken: 0.000043 seconds
Time per execution: 0.000000 seconds
----------------
CASE: var_nested_open
Total time taken: 1.107526 seconds
Time per execution: 0.011075 seconds
----------------
CASE: tag_nested_open
Total time taken: 0.513218 seconds
Time per execution: 0.005132 seconds
----------------
CASE: comment_nested_open
Total time taken: 1.100234 seconds
Time per execution: 0.011002 seconds
----------------
CASE: var_nested_closed
Total time taken: 0.002648 seconds
Time per execution: 0.000026 seconds
----------------
CASE: tag_nested_closed
Total time taken: 0.002371 seconds
Time per execution: 0.000024 seconds
----------------
CASE: comment_nested_closed
Total time taken: 0.002600 seconds
Time per execution: 0.000026 seconds
----------------
CASE: mismatched_nested
Total time taken: 0.514964 seconds
Time per execution: 0.005150 seconds
A quick summary of the results is that the common cases (e.g. {{ variable }}) have no performance impact while cases that previously causes excessive backtracking are significantly sped up.
Would you be able to add some of these example to the test suite - it would really help with not only making sure it works correctly, but also that it doesn't regress in future.
Sure! Are you talking about adding them as a benchmark or just unit tests? Also I believe this PR needs the djangonauts and benchmark labels
Unit tests mostly. If the test case would take a long time without the patch, but is nearly instant with, that's ideal. My original case from the ticket might be a good starting point.
Sure, but I'm unclear how you make a reliable unit test for timing? Do you have a suggestion?
Sure, but I'm unclear how you make a reliable unit test for timing? Do you have a suggestion?
I wouldn't worry about testing the timings When we add tests around DOS issues, we just add a unit test which shows the pattern that was slow and the test should take less time. You can look into https://github.com/django/django-asv and add a benchmark, this would be great but not mandatory