django icon indicating copy to clipboard operation
django copied to clipboard

Fixed #35675 - avoid backtracking in Template parsing

Open haileyajohnson opened this issue 1 year ago • 6 comments

Trac ticket number

ticket-35675

Branch description

This branch adds lookaheads to the regex pattern used by Templates to avoid the excessive backtracking described by 35675. I will post the script I used to test the performance and results in a follow-up comment.

Checklist

  • [x] This PR targets the main branch.
  • [x] The commit message is written in past tense, mentions the ticket number, and ends with a period.
  • [x] I have checked the "Has patch" ticket flag in the Trac system.
  • [x] I have added or updated relevant tests.
  • [x] I have added or updated relevant docs, including release notes if applicable.
  • [x] I have attached screenshots in both light and dark modes for any UI changes.

haileyajohnson avatar Apr 08 '25 16:04 haileyajohnson

I used this script to test performance:

import timeit

if __name__ == "__main__":
    n_exec = 100
    n_nested = 2000
    template_str_cases = [
        ('var', '{{ somevar }}'),
        ('tag', '{% if something %}'),
        ('comment', '{# somecomment #}'),
        ('mismatched', '{{ thesedontmatch #}'),
        ('var_nested_open', '{%'*n_nested),
        ('tag_nested_open', '{{'*n_nested),
        ('comment_nested_open', '{#'*n_nested),
        ('var_nested_closed', '{%'*n_nested + '%}'),
        ('tag_nested_closed', '{{'*n_nested + '}}'),
        ('comment_nested_closed', '{#'*n_nested + '#}'),
        ('mismatched_nested', '{{'*n_nested + '#}'),
        ]

    for case_name, template_str in template_str_cases:
        time = timeit.timeit("Lexer('" +template_str + "').tokenize()", setup="from django.template.base import Lexer", number=n_exec)
        t_per_exec = time / n_exec
        print("----------------\nCASE: {}".format(case_name))
        print(f"Total time taken: {time:.6f} seconds")
        print(f"Time per execution: {t_per_exec:.6f} seconds")

With the following output: No lookaheads -

----------------
CASE: var
Total time taken: 0.000072 seconds
Time per execution: 0.000001 seconds
----------------
CASE: tag
Total time taken: 0.000074 seconds
Time per execution: 0.000001 seconds
----------------
CASE: comment
Total time taken: 0.000065 seconds
Time per execution: 0.000001 seconds
----------------
CASE: mismatched
Total time taken: 0.000051 seconds
Time per execution: 0.000001 seconds
----------------
CASE: var_nested_open
Total time taken: 2.305461 seconds
Time per execution: 0.023055 seconds
----------------
CASE: tag_nested_open
Total time taken: 4.184063 seconds
Time per execution: 0.041841 seconds
----------------
CASE: comment_nested_open
Total time taken: 2.290232 seconds
Time per execution: 0.022902 seconds
----------------
CASE: var_nested_closed
Total time taken: 0.002476 seconds
Time per execution: 0.000025 seconds
----------------
CASE: tag_nested_closed
Total time taken: 0.002256 seconds
Time per execution: 0.000023 seconds
----------------
CASE: comment_nested_closed
Total time taken: 0.002438 seconds
Time per execution: 0.000024 seconds
----------------
CASE: mismatched_nested
Total time taken: 4.188701 seconds
Time per execution: 0.041887 seconds

With lookaheads -

----------------
CASE: var
Total time taken: 0.000076 seconds
Time per execution: 0.000001 seconds
----------------
CASE: tag
Total time taken: 0.000081 seconds
Time per execution: 0.000001 seconds
----------------
CASE: comment
Total time taken: 0.000070 seconds
Time per execution: 0.000001 seconds
----------------
CASE: mismatched
Total time taken: 0.000043 seconds
Time per execution: 0.000000 seconds
----------------
CASE: var_nested_open
Total time taken: 1.107526 seconds
Time per execution: 0.011075 seconds
----------------
CASE: tag_nested_open
Total time taken: 0.513218 seconds
Time per execution: 0.005132 seconds
----------------
CASE: comment_nested_open
Total time taken: 1.100234 seconds
Time per execution: 0.011002 seconds
----------------
CASE: var_nested_closed
Total time taken: 0.002648 seconds
Time per execution: 0.000026 seconds
----------------
CASE: tag_nested_closed
Total time taken: 0.002371 seconds
Time per execution: 0.000024 seconds
----------------
CASE: comment_nested_closed
Total time taken: 0.002600 seconds
Time per execution: 0.000026 seconds
----------------
CASE: mismatched_nested
Total time taken: 0.514964 seconds
Time per execution: 0.005150 seconds

A quick summary of the results is that the common cases (e.g. {{ variable }}) have no performance impact while cases that previously causes excessive backtracking are significantly sped up.

haileyajohnson avatar Apr 08 '25 16:04 haileyajohnson

Would you be able to add some of these example to the test suite - it would really help with not only making sure it works correctly, but also that it doesn't regress in future.

RealOrangeOne avatar Apr 08 '25 16:04 RealOrangeOne

Sure! Are you talking about adding them as a benchmark or just unit tests? Also I believe this PR needs the djangonauts and benchmark labels

haileyajohnson avatar Apr 08 '25 17:04 haileyajohnson

Unit tests mostly. If the test case would take a long time without the patch, but is nearly instant with, that's ideal. My original case from the ticket might be a good starting point.

RealOrangeOne avatar Apr 08 '25 18:04 RealOrangeOne

Sure, but I'm unclear how you make a reliable unit test for timing? Do you have a suggestion?

haileyajohnson avatar Apr 08 '25 19:04 haileyajohnson

Sure, but I'm unclear how you make a reliable unit test for timing? Do you have a suggestion?

I wouldn't worry about testing the timings When we add tests around DOS issues, we just add a unit test which shows the pattern that was slow and the test should take less time. You can look into https://github.com/django/django-asv and add a benchmark, this would be great but not mandatory

sarahboyce avatar Apr 09 '25 11:04 sarahboyce