monosi
monosi copied to clipboard
Timestamp columns inferred incorrectly
Timestamp columns are incorrectly inferred on nested columns
On (at least) a bigquery dataset, when using tables that have nested/repeated columns that contain a timestamp element (i.e. column type array<timestamp, timestamp>
), the field is incorrectly inferred as being a timestamp field due to containing timestamp
in the type name.
The first encountered timestamp field is used to calculate freshness here which errors out and causes the monitor for the table to fail.
Based on the source code, similar behaviour can be expected for columns of type e.g. struct<int, timestamp>
, or or type struct/array with nested date
/datetime
-like fields
Expected behavior
I would expect one of two things to happen:
- The nested field is ignored for freshness calculation and if no other timestamp-like column is found, no freshness monitor is created for the table
- The nested field is unnested and the first field within that is used to calculate freshness. This is more complex and so more error-prone.
Steps to reproduce
- Create table with a nested timestamp/date-like field
- [Optional] create a control table without the nested structure
- Add the data source that includes the table with the nested field a. Navigate to /sources, b. Click "Create Data Source" c. Fill in credentials for the data source
- Wait for a few minutes to allow the monitors to run
- Run
docker logs <CONTAINER_NAME>
and check for progress/errors - See error -- This should look something like:
ERROR:root:(google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 Invalid cast from ARRAY<STRUCT<start TIMESTAMP, end TIMESTAMP>> to TIMESTAMP at [37:25]
...
[SQL:
SELECT
CURRENT_TIMESTAMP() as `WINDOW_START`,
CURRENT_TIMESTAMP() as `WINDOW_END`,
COUNT(*) as `ROW_COUNT`,
'xx' as `TABLE_NAME`,
'xx' as `DATABASE_NAME`,
'xx' as `SCHEMA_NAME`,
...
TIMESTAMP_DIFF(MAX(CAST(windows AS TIMESTAMP)), CURRENT_TIMESTAMP(), MINUTE) AS windows___freshness
FROM xx.xx.xx;
]
(Background on this error at: https://sqlalche.me/e/14/4xp6)