databend
databend copied to clipboard
bug: aggregate scoping got unexpected results
Search before asking
- [X] I had searched in the issues and found no similar issues.
Version
v1.2.208-nightly
What's Wrong?
I tried to replicate the tricky cases mentioned in the blog, and found that our aggregation behave differently from duckdb or cockroach preparation
CREATE TABLE aa (a int);
INSERT INTO aa VALUES (1), (2), (3)
CREATE Table xx(x int);
INSERT INTO xx VALUES (10), (20), (30)
SQL:
SELECT (SELECT sum(a+x) FROM xx LIMIT 1) FROM aa;
databend result (bendsql cli)
┌───────────────────────────────────────┐
│ (select sum((a + x)) from xx limit 1) │
│ Nullable(Int64) │
├───────────────────────────────────────┤
│ 63 │
│ NULL │
│ NULL │
└───────────────────────────────────────┘
duckdb result
┌───────────────────────────────────────┐
│ (SELECT sum((a + x)) FROM xx LIMIT 1) │
│ int128 │
├───────────────────────────────────────┤
│ 63 │
│ 66 │
│ 69 │
└───────────────────────────────────────┘
SELECT (SELECT sum(a) FROM xx LIMIT 1) FROM aa;
databend result error: APIError: ResponseError with 1006: Unable to get field named "0". Valid fields: []
duckdb result
┌─────────────────────────────────┐
│ (SELECT sum(a) FROM xx LIMIT 1) │
│ int128 │
├─────────────────────────────────┤
│ 6 │
└─────────────────────────────────┘
the rest of the SQL
SELECT (SELECT sum(1) FROM xx LIMIT 1) FROM aa;
SELECT (SELECT sum(x) FROM xx LIMIT 1) FROM aa;
has the same result
How to Reproduce?
No response
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Hi I'm new here, I am interested in this issue, but I am not familiar with this project. Apart from EXPLAIN
, is there any other way to see the data flow of SQL execution?
@suimenno3002 Hello, and thank you for your interest in this issue.
For your question, EXPLAIN is ok? With the latest version(v1.2.132-nightly), the explain like:
-[ EXPLAIN ]-----------------------------------
EvalScalar
├── output columns: [(select sum((a + x)) from xx limit 1) (#4)]
├── expressions: [scalar_subquery_3 (#3)]
├── estimated rows: 3.00
└── HashJoin
├── output columns: [sum((a + x)) (#3)]
├── join type: LEFT SINGLE
├── build keys: [a (#5)]
├── probe keys: [a (#0)]
├── filters: []
├── estimated rows: 3.00
├── Limit(Build)
│ ├── output columns: [sum((a + x)) (#3), a (#5)]
│ ├── limit: 1
│ ├── offset: 0
│ ├── estimated rows: 0.00
│ └── AggregateFinal
│ ├── output columns: [sum((a + x)) (#3), a (#5)]
│ ├── group by: [a]
│ ├── aggregate functions: [sum(sum_arg_0)]
│ ├── limit: 1
│ ├── estimated rows: 0.00
│ └── AggregatePartial
│ ├── output columns: [sum((a + x)) (#3), #_group_by_key]
│ ├── group by: [a]
│ ├── aggregate functions: [sum(sum_arg_0)]
│ ├── estimated rows: 0.00
│ └── EvalScalar
│ ├── output columns: [a (#5), sum_arg_0 (#2)]
│ ├── expressions: [a (#5) + xx.x (#1)]
│ ├── estimated rows: 0.00
│ └── HashJoin
│ ├── output columns: [xx.x (#1), a (#5)]
│ ├── join type: CROSS
│ ├── build keys: []
│ ├── probe keys: []
│ ├── filters: []
│ ├── estimated rows: 0.00
│ ├── AggregateFinal(Build)
│ │ ├── output columns: [a (#5)]
│ │ ├── group by: [a]
│ │ ├── aggregate functions: []
│ │ ├── estimated rows: 0.00
│ │ └── AggregatePartial
│ │ ├── output columns: [#_group_by_key]
│ │ ├── group by: [a]
│ │ ├── aggregate functions: []
│ │ ├── estimated rows: 0.00
│ │ └── TableScan
│ │ ├── table: default.checksb.aa
│ │ ├── output columns: [a (#5)]
│ │ ├── read rows: 3
│ │ ├── read bytes: 44
│ │ ├── partitions total: 1
│ │ ├── partitions scanned: 1
│ │ ├── pruning stats: [segments: <range pruning: 1 to 1>, blocks: <range pruning: 1 to 1, bloom pruning: 0 to 0>]
│ │ ├── push downs: [filters: [], limit: NONE]
│ │ └── estimated rows: 0.00
│ └── TableScan(Probe)
│ ├── table: default.checksb.xx
│ ├── output columns: [x (#1)]
│ ├── read rows: 3
│ ├── read bytes: 44
│ ├── partitions total: 1
│ ├── partitions scanned: 1
│ ├── pruning stats: [segments: <range pruning: 1 to 1>, blocks: <range pruning: 1 to 1, bloom pruning: 0 to 0>]
│ ├── push downs: [filters: [], limit: NONE]
│ └── estimated rows: 3.00
└── TableScan(Probe)
├── table: default.checksb.aa
├── output columns: [a (#0)]
├── read rows: 3
├── read bytes: 44
├── partitions total: 1
├── partitions scanned: 1
├── pruning stats: [segments: <range pruning: 1 to 1>, blocks: <range pruning: 1 to 1, bloom pruning: 0 to 0>]
├── push downs: [filters: [], limit: NONE]
└── estimated rows: 3.00
And the result is:
bohu@(askbend-small)/checksb> SELECT (SELECT sum(a+x) FROM xx LIMIT 1) FROM aa;
SELECT
(
SELECT
sum(a + x)
FROM
xx
LIMIT
1
)
FROM
aa
┌───────────────────────────────────────┐
│ (select sum((a + x)) from xx limit 1) │
│ Nullable(Int64) │
├───────────────────────────────────────┤
│ 63 │
│ NULL │
│ NULL │
└───────────────────────────────────────┘
But the snowflake returns an error(Databend does not follow the duckdb or cockroach):
Unsupported subquery type cannot be evaluated
@BohuTANG
Thanks for your reply, I found the following statement in the snowflake documentation:
The only type of subquery that allows a LIMIT / FETCH clause is an uncorrelated scalar subquery. Also, because an uncorrelated scalar subquery returns only 1 row, the LIMIT clause has little or no practical value inside a subquery.
So if we want to align the behavior of snowflake in this situation, maybe we can give a clear error message like snowflake.
The issue is a correlated subquery for aggregate. It can be a bit complex, especially for the beginner in databend.
I need to evaluate it and decide how to solve (support or report error) the issue.
Thanks for your interest @suimenno3002 !