couchbase-lite-core
couchbase-lite-core copied to clipboard
CBL-5522: Port - N1QL Parser has exponential slowdown for redundant parentheses
Context of why the change fixes something?
@borrrden For expressions with many redundant layers of parentheses (i.e. (((expr)))) N1QL parser performance decreases exponentially, the more levels of parentheses. An expression that takes 60ms with 1 pair of parentheses, takes several minutes with 5.
The reason is because the parser operates recursively, and previously the only expression which captured parentheses is at the very end of that recursive chain, at _baseExpr. So for every level of parentheses, the parser would go through expr9, 8, 7, etc until reaching _baseExpr, strip one level of parentheses, then start at the top again.
This change strips all but 1 pair of parentheses from each expression, at the very top-level. So all unnecessary recursion due to parentheses is avoided.
Have you compared with this test, "N1QL Performance"/"N1QL benchmark"?
Have you compared with this test, "N1QL Performance"/"N1QL benchmark"?
@jianminzhao No? I don't believe that's relevant?
For the cases of redundant parentheses, I saw an expression like
SELECT * FROM _ WHERE ((((((type == '7')))))) reduced from several minutes to milliseconds.
FYI this fix is already in 3.1.7 also, this is just a port.
@jianminzhao See previous PR https://github.com/couchbase/couchbase-lite-core/pull/1983 Jens was also happy with this change.
Code Coverage Results:
| Type | Percentage |
|---|---|
| branches | 69.85 |
| functions | 80.42 |
| instantiations | 36.84 |
| lines | 80.37 |
| regions | 76.97 |
@jianminzhao I ran the test out of curiosity. This issue doesn't really show with the small amount (4) nested parentheses in the test. I changed it to 6 nested, I get this in master:
Elapsed time/check time = 2.52361/0.5
And in this branch:
Elapsed time/check time = 0.000412333/0.5
With 4 nested, in master:
Elapsed time/check time = 0.026721/0.5
In this branch:
Elapsed time/check time = 0.000403917/0.5
The purpose of that test is to catch regression in performance. PEG has quirky thing that some rules may affect performance gravely. The degradation can show up in as simple as several level of parenthesis. How does your new rule affect that test? Will it be run the test just as fast no matter how many levels of parenthesis?
@jianminzhao Exactly yes. Because it removes all the parentheses at the top level, without recursion, the performance impact of the number of parentheses is now negligible.
That is not the purpose of that test. The purpose of the test is to discover performance degradation of other grammar rules. Now, this test becomes meaningless.
The problem is solved though, so what is the issue? If you want to write more tests to try other potentially performance impacting queries that's fine.
The purpose of this commit is to improve the performance by removing unnecessary parenthesis. The purpose of this particular test is to magnify the performance degradation of additional grammar rule. In short, this test can be removed with this commit.