splink
splink copied to clipboard
[FEAT] Duplicated code in `linker.X_from_labels_Y()`
Is your proposal related to a problem?
Several methods in linker.py duplicate a lot of code by having separate functions, X_from_labels_table
and X_from_labels_column
where X is:
-
prediction_errors
-
truth_space_table
-
roc_chart
-
precision_recall_chart
-
accuracy_chart
- ~
confusion_matrix
~ (DELETED) -
threshold_selection_tool
These functions contribute almost 1000 lines to linker.py
Describe the solution you'd like
Adding arguments to distinguish between labels in the source data or in a separate table would allow for simpler function names and almost halve the lines of code by removing duplication. The charts functions mostly hinge on whether they use truth_space_table_from_labels_table
or truth_space_table_from_labels_column
to perform the same task.
For example linker.roc_chart_from_labels_table("labels")
becomes something like linker.roc_chart("labels", from="table")
Additional context
You could argue that many of these methods are no longer required once #2003 is merged.