Implement `cross_tab` operation for Snowflake

Open radeusgd opened this issue 1 year ago • 0 comments

Currently cross_tab is only implemented in-memory.

We want to implement it for the Snowflake backend using https://docs.snowflake.com/en/sql-reference/constructs/pivot

The biggest challenge is that the set of output columns is 'dynamic' - it depends on the data.

In DB_Table we don't immediately materialize the queries, but instead we are creating 'schemes' that are used to build the final query to be run once data is demanded out of it. cross_tab should behave the same way.

However, it will need to run at least a partial query, to be able to figure out the column names for the resulting table.

Dynamic column set

Important challenge to solve is related to the following scenario:

we call t2 = t1.cross_tab ...
the columns that t2 has are computed based on the data at the moment of constructing this DB_Table instance,
we construct t3 by further transforming t2 (transforming columns, joining etc.)
later we finally materialize t3, e.g. by t3.read.

Between steps (1) and (4) the data in the Database may change, meaning that the columns that cross_tab should have returned may be different than what was returned when step (1) was executed. Thus some new columns may have appeared and also some existing columns may disappear, but the t3 was constructed based on the 'old' column set, thus it should actually be 'invalidated'.

Most likely, we should detect such a situation (they should be relatively rare) and in that case raise an error when materializing the table in step (4). The error should ask the user to re-run the whole part of the workflow: if we re-run all steps 1-4 again, hoping that this time there won't be any data modifications in the meantime, the 4th step will be able to return a result consistent with the partial result from 1.

Oct 10 '24 15:10 radeusgd