metabase-clickhouse-driver icon indicating copy to clipboard operation
metabase-clickhouse-driver copied to clipboard

JSON unfolding / nested objects

Open atillier opened this issue 2 years ago • 7 comments

Describe the bug

Nested JSON object are not rendered / queryable

Steps to reproduce

  1. Load a Clickhouse table containing nested objects
  2. See the table / try to query it

Expected behaviour

Data able to be displayed

Error log

Configuration

Environment

  • metabase-clickhouse-driver version: 1.2.2
  • Metabase version: 47.3
  • OS: MacOS

ClickHouse server

  • ClickHouse Server version: 23.9.1.1653
  • CREATE TABLE statements for tables involved:
CREATE or REPLACE TABLE default."arm-dev.bet-updates.json"
(
    id Int32,
    minStakeLimit Decimal(11, 2),
    maxStakeLimit Decimal(11, 2),
    numberCombinations Int32,
    legs Nested
    (
        id Int32,
        legNumber Int32,
        legType String,
        parts Nested
        (
            id Int32,
        ),
        errors Array(String)
    ),
    errors Array(String),
    creationTime Float64,
    customer Tuple
    (
        id Int32,
        tenantId Int32,
        sportsbookRef String,
        stakeFactor Decimal(11, 2),
        testing Nullable(UInt8)
    )
)
  • Sample data for all these tables, use [clickhouse-obfuscator]
select id, minStakeLimit, maxStakeLimit, numberCombinations, legs, errors, creationTime, customer from "arm-dev.bet-updates.json" order by id, creationTime;

SELECT
    id,
    minStakeLimit,
    maxStakeLimit,
    numberCombinations,
    legs,
    errors,
    creationTime,
    customer
FROM `arm-dev.bet-updates.json`
ORDER BY
    id ASC,
    creationTime ASC

Query id: 70a0bb6a-f5f7-4922-9376-2420898ebcaf

┌──id─┬─minStakeLimit─┬─maxStakeLimit─┬─numberCombinations─┬─legs──────────────────────────────────────────────────────────────────────┬─errors─┬───────creationTime─┬─customer────────────┐
│ 252 │           0.1 │          1000 │                  1 │ [(252,1,'WIN',[(430,1,'WIN','SP',1436808,241272,18935,143,5,1,true)],[])] │ []     │ 1695802093.3665981 │ (1,1,'one_gbp',1,0) │
│ 253 │           0.1 │          1000 │                  1 │ [(253,1,'WIN',[(431,1,'WIN','SP',1435945,241116,18922,268,5,1,true)],[])] │ []     │ 1695806272.1980581 │ (1,1,'one_gbp',1,0) │
│ 254 │           0.1 │          1000 │                  1 │ [(254,1,'WIN',[(432,1,'WIN','SP',1435947,241116,18922,268,5,1,true)],[])] │ []     │ 1695806570.9034572 │ (1,1,'one_gbp',1,0) │
│ 255 │           0.1 │          1000 │                  1 │ [(255,1,'WIN',[(433,1,'WIN','SP',1436808,241272,18935,143,5,1,true)],[])] │ []     │  1695806939.310054 │ (1,1,'one_gbp',1,0) │
│ 256 │           0.1 │          1000 │                  1 │ [(256,1,'WIN',[(434,1,'WIN','SP',1435945,241116,18922,268,5,1,true)],[])] │ []     │ 1695808316.8749661 │ (1,1,'one_gbp',1,0) │
│ 257 │           0.1 │          1000 │                  1 │ [(257,1,'WIN',[(435,1,'WIN','SP',1435945,241116,18922,268,5,1,true)],[])] │ []     │   1695813181.35838 │ (1,1,'one_gbp',1,0) │

See how it renders: Screenshot 2023-10-09 at 17 06 06

And how I can't query it:

Screenshot 2023-10-09 at 17 06 37

atillier avatar Oct 09 '23 16:10 atillier

This looks like a feature request. I see that clickhouse-jdbc now supports Nested. We will check if it's possible to map it within Metabase properly, as the generated queries should be quite unusual (like that legs.id bit).

slvrtrn avatar Oct 09 '23 16:10 slvrtrn

If Metabase cannot support nested typed structures, then theoretically, during the introspection phase, we can recursively flatten it for starters, e.g.

CREATE or REPLACE TABLE default."arm-dev.bet-updates.json"
(
    id Int32,
    legs Nested
    (
        id Int32,
        legNumber Int32,
        parts Nested
        (
            id Int32,
        )
    )
);

This could be introspected by Metabase driver as an equivalent of:

CREATE or REPLACE TABLE default."arm-dev.bet-updates.json"
(
    `id` Int32,
    `legs.id` Int32
    `legs.legNumber` Int32,
    `legs.parts.id` Int32
);

because something like

SELECT `legs.legNumber`, `legs.parts.id` FROM default."arm-dev.bet-updates.json";

is actually a valid query.

However, I need to verify that this approach flies with the actual generated queries from within Metabase itself, so no promises here yet.

slvrtrn avatar Oct 09 '23 16:10 slvrtrn

Regarding the Metabase error: isn't it the same in clickhouse-client for the given query?

SELECT id
FROM default.`arm-dev.bet-updates.json`
WHERE legs.id = 42
0 rows in set. Elapsed: 0.002 sec. 

Received exception from server (version 23.8.3):
Code: 43. DB::Exception: Received from localhost:9000. DB::Exception: Illegal types of arguments (Array(Int32), UInt8) of function equals: While processing legs.id = 42. (ILLEGAL_TYPE_OF_ARGUMENT)

slvrtrn avatar Oct 09 '23 16:10 slvrtrn

Thanks @slvrtrn !

Yes you are very right, Clickhouse Client is also erroring!

atillier avatar Oct 09 '23 18:10 atillier

The WHERE clause operations should assume you are actually working with an array when filtering by nested column.

See how Goal.ID is used in WHERE clause here:

SELECT
    Goals.ID,
    Goals.EventTime
FROM test.visits
WHERE CounterID = 101500 AND length(Goals.ID) < 5
LIMIT 10

Regarding the implementation, we also have the :nested-field-columns Metabase feature which is used for JSONs and similar stuff, but again, because of a tricky WHERE clause, I am not entirely sure whether it will fit.

slvrtrn avatar Oct 09 '23 18:10 slvrtrn

I suppose that before any fix is implemented I can explode my JSON in Clickhouse, maybe via a view, right?

atillier avatar Oct 10 '23 08:10 atillier

@atillier, yes, a view could work fine in this case.

slvrtrn avatar Oct 19 '23 23:10 slvrtrn