Tom Baeyens

Results 56 comments of Tom Baeyens

Asked UT about the missing Python type support. As Decimal seems to be handled. https://soda-community.slack.com/archives/C01HYL8V64C/p1617709550103700?thread_ts=1615811570.096300&cid=C01HYL8V64C

solution is to upgrade the bigquery version to 2.6.2 https://github.com/apache/airflow/issues/13131 i ll update it

Trying to upgrade the big query lib caused problems on 3.9 so the change was reverted. First step is to check if the problem persists after the module split

Some databases also have sampling capabilities in SQL like snowflake https://docs.snowflake.com/en/sql-reference/constructs/sample.html Eg `select * from testtable tablesample bernoulli (20.3);` IIRC there are other warehouses supporting the same or similar.

If we support table level configurations they should go in the scan YAML file. A generic analyze filter or such config should go in the warehouse file. I suggest we...

Pasting discussion for reference: > murat migdisoglu Yesterday at 9:52 AM > I got another question related to the analyze phase: When I run soda analyze, the DatasetAnalyzer.analyze (dataaset_analyzer.py) runs...

> @tombaeyens is there a SCAN yml file? or do you mean warehouse.yml? You're right. My bad. I realized a bit later that indeed there is no scan YAML file...

Throwing out an idea: what if we split up the analyze in 2 phases: 1) analyze tables: it would generate the scan YAML files, but not yet inspect each column...

@AlessandroLollo Adding the SQL to the measurement is another option, but then we would duplicate the same (long) query a lot of times. As the first query combines all aggregations...

Reflecting on this, my current thought is that we should go for a strategy to strictly separate the interpretation of the keys. I think that will lead to better error...