splink icon indicating copy to clipboard operation
splink copied to clipboard

# Remove unused `_input_columns` method from Linker

Open RobinL opened this issue 3 weeks ago • 0 comments

Summary

The _input_columns method in splink/internals/linker.py (lines 186-245) is only used in one place and can be replaced with simpler existing code.

Current Usage

The method is only called in find_blocking_rules_below_threshold_comparison_count():

column_expressions = linker._input_columns(
    include_unique_id_col_names=False,
    include_additional_columns_to_retain=False,
)

Problem

  1. The docstring says it should use "columns used by the ComparisonLevels" but _input_columns returns all input dataframe columns
  2. _settings_obj._columns_used_by_comparisons already exists and does exactly what the docstring describes
  3. The 60-line method is overly complex for what's needed

Proposed Fix

Replace lines 247-250 in find_brs_with_comparison_counts_below_threshold.py:

if column_expressions is None:
    column_expressions = linker._settings_obj._columns_used_by_comparisons

Then delete the _input_columns method from linker.py.

Note

_columns_used_by_comparisons returns List[str] directly, eliminating the need for the InputColumn to string conversion loop that follows.

RobinL avatar Dec 05 '25 11:12 RobinL