reladiff
reladiff copied to clipboard
Feature Request: Add an option to pass '*' to `extra_columns` in order to compare all columns in the tables.
Is your feature request related to a problem? Please describe. When working on table migrations, it's very time-consuming to have to get the schema for each table, and pass all columns into reladiff.
Describe the solution you'd like
I would love an option to pass '*' to extra_columns and have reladiff infer the schemas automatically for matching column names.
Describe alternatives you've considered
At the very least it would be helpful if somewhere in the api the full table schema was exposed. Something identical to TableSegment.with_schema()._schema except with all columns in the table.
You can do this from the command line using -c %.
But I'm all for adding a way to do it through the Python API too.
Something identical to TableSegment.with_schema()._schema except with all columns in the table.
Does _schema not contain all the columns?
Good to know about the CLI. Just adding an option to pass '%' to the extra columns via a config yaml file would be enough for my usecase.
No, _schema is just a combination of key column(s) and extra column(s) when using the python API and a yaml config.
print(table1.with_schema())
TableSegment(
database=<reladiff.databases.redshift.Redshift object at 0x1033cadf0>,
table_path=('dev', 'table_name'), key_columns=('header_id', 'line_id'),
update_column=None,
extra_columns=('header_desc', 'amount'),
min_key=None,
max_key=None,
min_update=None,
max_update=None,
where='1 = 1',
case_sensitive=True,
_schema={
'header_id': Integer(precision=0, python_type=<class 'int'>),
'header_desc': Text(),
'line_id': Decimal(precision=0),
'amount': Float(precision=13)
}
)
I suppose you could do something like this -
new_ts: TableSegment = ts.replace(key_columns=tuple(set(ts._schema)-set(ts.key_columns))
For now, i have a preprocess step, which does rowcounts, plus retrieves schema from both source & target, then uses the common columns. I have a separate function but you should be able to utilize Database.select_table_schema function in sqeleton, which will query information schema for column names.
common_columns = [k for k in source_columns.keys() if k in target_columns.keys()]
Note, that you have to remove primary key(s) from this list, then just pass this information in the extra_columns param.
You can also use sets if you want:
source_cols=set()
target_cols=set()
primary_keys=set()
excluded_cols=set()
extra_cols = source_cols.intersection(target_cols) - primary_keys - excluded_cols