sqlparse
sqlparse copied to clipboard
Quadratic performance when grouping is enabled.
I use sqlparse via django-debug-toolbar. I was noticing some very slow performance when rendering some of our queries, in particular those having the form SELECT ... FROM ... WHERE id IN (<a list of several hundred integers>)
.
After doing some more in-depth performance analysis, I found that there were at least two sources of quadratic performance when formatting a SQL statement with a filter stack having grouping enabled:
-
In the
TokenList.group_tokens()
method, when called withextend=True
, the value of the group is recomputed each time it is extended. Computing the value of the group is linear with the number of tokens in the group. For queries of the form referenced above, thegroup_identifier_list()
function ends up callingTokenList.group_tokens()
withextend=True
once for each of the integers in theIN
clause, resulting in quadratic behavior. -
In the
TokenList._token_matching()
method, a slice of theself.tokens
list is taken. Taking the slice is roughly linear with the number of tokens, and there are several places whereTokenList._token_matching()
is also called linearly with respect to the number of tokens, resulting in quadratic behavior.
Here is a link to an IPython notebook demonstrating this quadratic performance: https://gist.github.com/living180/ad9f83b6e1fb494e1305a281d4b552b3
I have separate fixes for each of these issues which I will submit as pull requests.