splink icon indicating copy to clipboard operation
splink copied to clipboard

Feature/add single comparison column validation check

Open ThomasHepworth opened this issue 9 months ago • 2 comments

Type of PR

  • [ ] BUG
  • [x] FEAT
  • [ ] MAINT
  • [ ] DOC

Is your Pull Request linked to an existing Issue or Pull Request?

Adds a validation check to assess whether any input data frames entered by a user have only a single comparison column - https://github.com/moj-analytical-services/splink/issues/1392.

Give a brief description for the solution you have provided

This function adds a basic check to determine whether any of the user's data frames contain only one comparison column after excluding the unique_id and source_dataset columns.

Where this is true, Splink will generate invalid SQL, leading to cryptic SQL errors that may be difficult to interpret.

NB: this check only activates if the user has correctly specified unique_id and source_dataset. Should there be any mistakes in entering these variables in the settings, other validation checks will identify and highlight these errors.

PR Checklist

  • [ ] Added documentation for changes
  • [ ] Added feature to example notebooks or tutorial (if appropriate)
  • [x] Added tests (if appropriate)
  • [ ] Updated CHANGELOG.md (if appropriate)
  • [x] Made changes based off the latest version of Splink
  • [x] Run the linter
  • [ ] Run the spellchecker (if appropriate)

ThomasHepworth avatar May 01 '24 10:05 ThomasHepworth

Ah oops, I forgot to check if any of our tests are affected by this logic...

ThomasHepworth avatar May 01 '24 10:05 ThomasHepworth

@RobinL could you let me know if you even want this change in Splink3. I can port the code to v4 and ignore this version if preferred.

ThomasHepworth avatar May 01 '24 13:05 ThomasHepworth