druid icon indicating copy to clipboard operation
druid copied to clipboard

Incorrect output for UNNEST on nested arrays

Open aho135 opened this issue 8 months ago • 2 comments

We are observing incorrect output for UNNEST when applying it on nested arrays

Affected Version

32.0.1

Description

Given the following "values" table:

__time value
2025-04-24T21:20:10.000Z 1
2025-04-24T21:21:10.000Z 2
2025-04-24T21:22:10.000Z 3

And the following query:

SELECT *
FROM (
  SELECT ARRAY_AGG(ARRAY[__time, "value"], 1000) AS array_agg
  FROM "values"
)
CROSS JOIN UNNEST(array_agg) AS unnested

We are receiving this as the result:

array_agg unnested
[[1745529610000,1],[1745529670000,2],[1745529730000,3]] [1745529610000]
[[1745529610000,1],[1745529670000,2],[1745529730000,3]] null

This is the expected output:

array_agg unnested
[[1745529610000,1],[1745529670000,2],[1745529730000,3]] [1745529610000,1]
[[1745529610000,1],[1745529670000,2],[1745529730000,3]] [1745529670000,2]
[[1745529610000,1],[1745529670000,2],[1745529730000,3]] [1745529670000,3]

We are also observing similar behavior when trying to UNNEST with an array of JSON_OBJECTS:

SELECT *
FROM (
  SELECT ARRAY_AGG(JSON_OBJECT('t':__time, 'v':"value"), 1000) AS array_agg
  FROM "values"
)
CROSS JOIN UNNEST(array_agg) AS unnested

We are receiving this as the result:

array_agg unnested
[{"t":1745529610000,"v":1},{"t":1745529670000,"v":2},{"t":1745529730000,"v":3}] {"t":1745529610000,"v":null}

We are also observing incorrect results when trying to UNNEST an array of doubles:

SELECT *
FROM (
  SELECT ARRAY_AGG("value", 1000) AS array_agg
  FROM "values"
)
CROSS JOIN UNNEST(array_agg) AS unnested
array_agg unnested
null 1
null 2
null 3

Expected output:

array_agg unnested
[1, 2, 3] 1
[1, 2, 3] 2
[1, 2, 3] 3

aho135 avatar Apr 25 '25 17:04 aho135

I think this is possibly the same issue, or at least a closely related issue, as #17951. I ran both queries with "plannerStrategy": "DECOUPLED" and got the correct plan and correct results.

gianm avatar May 08 '25 05:05 gianm

Btw, here's the SQL to load the table:

REPLACE INTO "values"
OVERWRITE ALL
SELECT TIME_PARSE(ts) AS __time, "value"
FROM (VALUES('2025-04-24 21:20:10', 1), ('2025-04-24 21:21:10', 2), ('2025-04-24 21:22:10', 3)) AS t(ts, "value") 
PARTITIONED BY HOUR

gianm avatar May 08 '25 05:05 gianm