sqlparse icon indicating copy to clipboard operation
sqlparse copied to clipboard

Quadratic performance when grouping is enabled.

Open living180 opened this issue 3 years ago • 0 comments

I use sqlparse via django-debug-toolbar. I was noticing some very slow performance when rendering some of our queries, in particular those having the form SELECT ... FROM ... WHERE id IN (<a list of several hundred integers>).

After doing some more in-depth performance analysis, I found that there were at least two sources of quadratic performance when formatting a SQL statement with a filter stack having grouping enabled:

  • In the TokenList.group_tokens() method, when called with extend=True, the value of the group is recomputed each time it is extended. Computing the value of the group is linear with the number of tokens in the group. For queries of the form referenced above, the group_identifier_list() function ends up calling TokenList.group_tokens() with extend=True once for each of the integers in the IN clause, resulting in quadratic behavior.

  • In the TokenList._token_matching() method, a slice of the self.tokens list is taken. Taking the slice is roughly linear with the number of tokens, and there are several places where TokenList._token_matching() is also called linearly with respect to the number of tokens, resulting in quadratic behavior.

Here is a link to an IPython notebook demonstrating this quadratic performance: https://gist.github.com/living180/ad9f83b6e1fb494e1305a281d4b552b3

I have separate fixes for each of these issues which I will submit as pull requests.

living180 avatar May 19 '21 12:05 living180