reladiff Feature Request: Add an option to pass '*' to `extra_columns` in order to compare all columns in the tables.

Feature Request: Add an option to pass '*' to `extra_columns` in order to compare all columns in the tables.

Open MiConnell opened this issue 10 months ago • 4 comments

Is your feature request related to a problem? Please describe. When working on table migrations, it's very time-consuming to have to get the schema for each table, and pass all columns into reladiff.

Describe the solution you'd like I would love an option to pass '*' to extra_columns and have reladiff infer the schemas automatically for matching column names.

Describe alternatives you've considered At the very least it would be helpful if somewhere in the api the full table schema was exposed. Something identical to TableSegment.with_schema()._schema except with all columns in the table.

Jan 17 '25 23:01 MiConnell

You can do this from the command line using -c %.

But I'm all for adding a way to do it through the Python API too.

Something identical to TableSegment.with_schema()._schema except with all columns in the table.

Does _schema not contain all the columns?

Jan 17 '25 23:01 erezsh

Good to know about the CLI. Just adding an option to pass '%' to the extra columns via a config yaml file would be enough for my usecase.

No, _schema is just a combination of key column(s) and extra column(s) when using the python API and a yaml config.


print(table1.with_schema())

TableSegment(
  database=<reladiff.databases.redshift.Redshift object at 0x1033cadf0>,
  table_path=('dev', 'table_name'), key_columns=('header_id', 'line_id'),
  update_column=None,
  extra_columns=('header_desc', 'amount'),
  min_key=None,
  max_key=None,
  min_update=None,
  max_update=None,
  where='1 = 1',
  case_sensitive=True,
  _schema={
    'header_id': Integer(precision=0, python_type=<class 'int'>),
    'header_desc': Text(),
    'line_id': Decimal(precision=0),
    'amount': Float(precision=13)
  }
)

Jan 17 '25 23:01 MiConnell

I suppose you could do something like this -

    new_ts: TableSegment = ts.replace(key_columns=tuple(set(ts._schema)-set(ts.key_columns))

Jan 18 '25 14:01 erezsh

For now, i have a preprocess step, which does rowcounts, plus retrieves schema from both source & target, then uses the common columns. I have a separate function but you should be able to utilize Database.select_table_schema function in sqeleton, which will query information schema for column names.

common_columns = [k for k in source_columns.keys() if k in target_columns.keys()]

Note, that you have to remove primary key(s) from this list, then just pass this information in the extra_columns param.

You can also use sets if you want:

source_cols=set()
target_cols=set()
primary_keys=set()
excluded_cols=set()
extra_cols = source_cols.intersection(target_cols) - primary_keys - excluded_cols

Mar 29 '25 09:03 vmatt

reladiff reladiff copied to clipboard

Feature Request: Add an option to pass '*' to `extra_columns` in order to compare all columns in the tables.

reladiff
reladiff copied to clipboard