splink
splink copied to clipboard
Feature/add single comparison column validation check
Type of PR
- [ ] BUG
- [x] FEAT
- [ ] MAINT
- [ ] DOC
Is your Pull Request linked to an existing Issue or Pull Request?
Adds a validation check to assess whether any input data frames entered by a user have only a single comparison column - https://github.com/moj-analytical-services/splink/issues/1392.
Give a brief description for the solution you have provided
This function adds a basic check to determine whether any of the user's data frames contain only one comparison column after excluding the unique_id and source_dataset columns.
Where this is true, Splink will generate invalid SQL, leading to cryptic SQL errors that may be difficult to interpret.
NB: this check only activates if the user has correctly specified unique_id and source_dataset. Should there be any mistakes in entering these variables in the settings, other validation checks will identify and highlight these errors.
PR Checklist
- [ ] Added documentation for changes
- [ ] Added feature to example notebooks or tutorial (if appropriate)
- [x] Added tests (if appropriate)
- [ ] Updated CHANGELOG.md (if appropriate)
- [x] Made changes based off the latest version of Splink
- [x] Run the linter
- [ ] Run the spellchecker (if appropriate)
Ah oops, I forgot to check if any of our tests are affected by this logic...
@RobinL could you let me know if you even want this change in Splink3. I can port the code to v4 and ignore this version if preferred.