Tom Baeyens comments

Results 56 comments of


                                            Tom Baeyens

BigDecimal bug

Asked UT about the missing Python type support. As Decimal seems to be handled. https://soda-community.slack.com/archives/C01HYL8V64C/p1617709550103700?thread_ts=1615811570.096300&cid=C01HYL8V64C

Open telemetry warning

solution is to upgrade the bigquery version to 2.6.2 https://github.com/apache/airflow/issues/13131 i ll update it

Open telemetry warning

Trying to upgrade the big query lib caused problems on 3.9 so the change was reverted. First step is to check if the problem persists after the module split

Support providing a filter (or custom query) which is used when analyzing a table

Some databases also have sampling capabilities in SQL like snowflake https://docs.snowflake.com/en/sql-reference/constructs/sample.html Eg `select * from testtable tablesample bernoulli (20.3);` IIRC there are other warehouses supporting the same or similar.

Support providing a filter (or custom query) which is used when analyzing a table

If we support table level configurations they should go in the scan YAML file. A generic analyze filter or such config should go in the warehouse file. I suggest we...

Support providing a filter (or custom query) which is used when analyzing a table

Pasting discussion for reference: > murat migdisoglu Yesterday at 9:52 AM > I got another question related to the analyze phase: When I run soda analyze, the DatasetAnalyzer.analyze (dataaset_analyzer.py) runs...

Support providing a filter (or custom query) which is used when analyzing a table

> @tombaeyens is there a SCAN yml file? or do you mean warehouse.yml? You're right. My bad. I realized a bit later that indeed there is no scan YAML file...

Support providing a filter (or custom query) which is used when analyzing a table

Throwing out an idea: what if we split up the analyze in 2 phases: 1) analyze tables: it would generate the scan YAML files, but not yet inspect each column...

Capture query for each metric

@AlessandroLollo Adding the SQL to the measurement is another option, but then we would duplicate the same (long) query a lot of times. As the first query combines all aggregations...

Add support for `include` and `read` functions

Reflecting on this, my current thought is that we should go for a strategy to strictly separate the interpretation of the keys. I think that will lead to better error...