soda-sql
soda-sql copied to clipboard
No error is thrown when including tests for two tables in one scan.yml file
For me it seemed intuitive to include tests for multiple tables in one scan.yml file as follows:
table_name: orders
metrics:
- distinct
samples:
table_limit: 50
columns:
id:
valid_format: uuid
tests:
- distinct == 99
status:
tests:
- distinct == 5
table_name: customers
metrics:
- missing_count
- missing_percentage
- min_length
- distinct
samples:
table_limit: 50
metric_groups:
- profiling
- duplicates
columns:
id:
valid_format: uuid
tests:
- invalid_percentage == 0
- missing_count == 0
- distinct == 100
first_name:
tests:
- min_length > 1
last_name:
tests:
- min_length > 1
- invalid_count == 0
Doing this results in only the customers table being used, while ignoring the orders table. I think it would be nice to have support for testing multiple tables in one file, but before such functionality is implemented it would be user-friendly if a warning/error was thrown that soda currently cannot handle testing multiple tables in one file.
This is because of the default behaviour of PyYAML which overwrites the data. Can be resolved by writing Custom Loader. Here is an example:
# special loader with duplicate key checking
class UniqueKeyLoader(yaml.SafeLoader):
def construct_mapping(self, node, deep=False):
mapping = []
for key_node, value_node in node.value:
key = self.construct_object(key_node, deep=deep)
assert key not in mapping, f"Duplicate key in Yaml File: {key}"
mapping.append(key)
return super().construct_mapping(node, deep)
And then we can use this by calling:
filename='soda.yml'
yaml_text = open(filename, 'r').read()
data = yaml.load(yaml_text, Loader=UniqueKeyLoader)
The error looks like this:
AssertionError: Duplicate key in Yaml File: table_name
The error could be customized as we wish to.
@anilkulkarni87 that's very nice approach - would you like to open a PR with this?
@vijaykiran Yes I will work on it.
@vijaykiran I have created Draft pull request: https://github.com/sodadata/soda-sql/pull/624 I do have some questions though. Please take a look.
I will also have to make a change at : https://github.com/sodadata/soda-sql/blob/8a75b53902615d2724ed17c6560d4ec936dc449a/core/sodasql/scan/parser.py#L75