splink
splink copied to clipboard
# Remove unused `_input_columns` method from Linker
Summary
The _input_columns method in splink/internals/linker.py (lines 186-245) is only used in one place and can be replaced with simpler existing code.
Current Usage
The method is only called in find_blocking_rules_below_threshold_comparison_count():
column_expressions = linker._input_columns(
include_unique_id_col_names=False,
include_additional_columns_to_retain=False,
)
Problem
- The docstring says it should use "columns used by the ComparisonLevels" but
_input_columnsreturns all input dataframe columns _settings_obj._columns_used_by_comparisonsalready exists and does exactly what the docstring describes- The 60-line method is overly complex for what's needed
Proposed Fix
Replace lines 247-250 in find_brs_with_comparison_counts_below_threshold.py:
if column_expressions is None:
column_expressions = linker._settings_obj._columns_used_by_comparisons
Then delete the _input_columns method from linker.py.
Note
_columns_used_by_comparisons returns List[str] directly, eliminating the need for the InputColumn to string conversion loop that follows.