couchbase-lite-core icon indicating copy to clipboard operation
couchbase-lite-core copied to clipboard

CBL-5522: Port - N1QL Parser has exponential slowdown for redundant parentheses

Open callumbirks opened this issue 1 year ago • 13 comments

callumbirks avatar Mar 18 '24 14:03 callumbirks

Context of why the change fixes something?

borrrden avatar Mar 21 '24 01:03 borrrden

@borrrden For expressions with many redundant layers of parentheses (i.e. (((expr)))) N1QL parser performance decreases exponentially, the more levels of parentheses. An expression that takes 60ms with 1 pair of parentheses, takes several minutes with 5. The reason is because the parser operates recursively, and previously the only expression which captured parentheses is at the very end of that recursive chain, at _baseExpr. So for every level of parentheses, the parser would go through expr9, 8, 7, etc until reaching _baseExpr, strip one level of parentheses, then start at the top again.

This change strips all but 1 pair of parentheses from each expression, at the very top-level. So all unnecessary recursion due to parentheses is avoided.

callumbirks avatar Mar 22 '24 10:03 callumbirks

Have you compared with this test, "N1QL Performance"/"N1QL benchmark"?

jianminzhao avatar Mar 25 '24 17:03 jianminzhao

Have you compared with this test, "N1QL Performance"/"N1QL benchmark"?

@jianminzhao No? I don't believe that's relevant? For the cases of redundant parentheses, I saw an expression like SELECT * FROM _ WHERE ((((((type == '7')))))) reduced from several minutes to milliseconds. FYI this fix is already in 3.1.7 also, this is just a port.

callumbirks avatar Mar 27 '24 11:03 callumbirks

@jianminzhao See previous PR https://github.com/couchbase/couchbase-lite-core/pull/1983 Jens was also happy with this change.

callumbirks avatar Mar 27 '24 11:03 callumbirks

Code Coverage Results:

Type Percentage
branches 69.85
functions 80.42
instantiations 36.84
lines 80.37
regions 76.97

cbl-bot avatar Mar 27 '24 12:03 cbl-bot

@jianminzhao I ran the test out of curiosity. This issue doesn't really show with the small amount (4) nested parentheses in the test. I changed it to 6 nested, I get this in master:

Elapsed time/check time = 2.52361/0.5

And in this branch:

Elapsed time/check time = 0.000412333/0.5

callumbirks avatar Mar 27 '24 12:03 callumbirks

With 4 nested, in master:

Elapsed time/check time = 0.026721/0.5

In this branch:

Elapsed time/check time = 0.000403917/0.5

callumbirks avatar Mar 27 '24 12:03 callumbirks

The purpose of that test is to catch regression in performance. PEG has quirky thing that some rules may affect performance gravely. The degradation can show up in as simple as several level of parenthesis. How does your new rule affect that test? Will it be run the test just as fast no matter how many levels of parenthesis?

jianminzhao avatar Mar 27 '24 16:03 jianminzhao

@jianminzhao Exactly yes. Because it removes all the parentheses at the top level, without recursion, the performance impact of the number of parentheses is now negligible.

callumbirks avatar Mar 27 '24 17:03 callumbirks

That is not the purpose of that test. The purpose of the test is to discover performance degradation of other grammar rules. Now, this test becomes meaningless.

jianminzhao avatar Mar 28 '24 18:03 jianminzhao

The problem is solved though, so what is the issue? If you want to write more tests to try other potentially performance impacting queries that's fine.

borrrden avatar Mar 29 '24 02:03 borrrden

The purpose of this commit is to improve the performance by removing unnecessary parenthesis. The purpose of this particular test is to magnify the performance degradation of additional grammar rule. In short, this test can be removed with this commit.

jianminzhao avatar Mar 29 '24 22:03 jianminzhao