splink
splink copied to clipboard
[MAINT] Ensure consistent capitalisation when referencing functions named after people
Is your proposal related to a problem?
While reviewing spelling errors identified by the spellchecker, @zslade and I have noticed that we're very inconsistent when it comes to capitalising names, when used in functions class names.
For example, both ctl.name_comparison
and cl.levenshtein_at_thresholds
present these functions in lowercase:
import splink.duckdb.duckdb_comparison_library as cl
import splink.duckdb.comparison_template_library as ctl
print(ctl.name_comparison("first_name", phonetic_col_name = "first_name_dm"))
print(cl.levenshtein_at_thresholds("a"))
producing the following outputs:
Comparison 'Exact match vs. Names with phonetic exact match vs. First_Name within levenshtein threshold 1 vs. First_Name within damerau-levenshtein threshold 1 vs. First_Name within jaro_winkler thresholds 0.9, 0.8 vs. anything else' of "first_name" and "first_name_dm".
Similarity is assessed using the following ComparisonLevels:
- 'Null' with SQL rule: "first_name_l" IS NULL OR "first_name_r" IS NULL
- 'Exact match first_name' with SQL rule: "first_name_l" = "first_name_r"
- 'Exact match first_name_dm' with SQL rule: "first_name_dm_l" = "first_name_dm_r"
- 'Damerau_levenshtein <= 1' with SQL rule: damerau_levenshtein("first_name_l", "first_name_r") <= 1
- 'Jaro_winkler_similarity >= 0.9' with SQL rule: jaro_winkler_similarity("first_name_l", "first_name_r") >= 0.9
- 'Jaro_winkler_similarity >= 0.8' with SQL rule: jaro_winkler_similarity("first_name_l", "first_name_r") >= 0.8
- 'All other comparisons' with SQL rule: ELSE
<Comparison Exact match vs. A within levenshtein thresholds 1, 2 vs. anything else with 4 levels at 0x106d88c20>
Describe the solution you'd like
To achieve consistency and adhere to standard conventions in our documentation and the wider literature, we propose capitalizing the names in such instances.
Additionally, we've noticed occasional partial capitalisation, notably with cases such as "jaro-winkler", due to the use of the capitalize()
method. This only modifies the first letter of the string, leading to outputs like Jaro-winkler
from "jaro-winkler".capitalize()
.