Sam Lindsay

Results 12 issues of Sam Lindsay

Individual TF adjustments are shown on the waterfall chart but might be useful to have a chart showing the full range of TF adjustments for a column, relative the match...

enhancement

![image](https://user-images.githubusercontent.com/7570107/136946496-8769b06c-e4a6-488d-95f1-526946d96aa7.png) Generate a case expression with 3-5 levels: ``` CASE WHEN {full_match} THEN 4 WHEN {sector_match} THEN 3 WHEN {district_match} THEN 2 WHEN {area_match} THEN 1 ELSE 0 END ```

It is useful to show the top N values in order of frequency, but this doesn't help to identify outliers or discontinuities in the distribution of values (such as dates...

good first issue
profiling

### Type of PR - [ ] BUG - [x] FEAT - [ ] MAINT - [ ] DOC ### Is your Pull Request linked to an existing Issue or...

### Is your proposal related to a problem? Several methods in linker.py duplicate a lot of code by having separate functions, `X_from_labels_table` and `X_from_labels_column` where X is: - `prediction_errors` -...

enhancement

Wrapping up a few other issues: - [x] #969 - [ ] #856 - [ ] #131 And suggesting more: - **Profiling continuous/date columns (i.e. distributions rather than sorted histograms)**...

good first issue
profiling

### Is your proposal related to a problem? [Unlinkables chart ](https://moj-analytical-services.github.io/splink/charts/unlinkables_chart.html)is currently across all records, not distinguishing between different datasets in a link job. One dataset may have very different...

enhancement
good first issue
charts

### Is your proposal related to a problem? Related to discussion in https://github.com/moj-analytical-services/data_linking/issues/329 and #1677 Cluster IDs in splink used to be integers and are now (v3) the min ID...

enhancement
clustering

### Is your proposal related to a problem? Attempting to build a comparison for a date column _without_ a `datediff_level`? (i.e. exact, Damerau-Levenshtein, else) ```python cl.damerau_levenshtein_at_thresholds("date_col", [1,2]) ``` :x: date...

enhancement

### Is your proposal related to a problem? Intuition around `probability_two_random_records_match` is limited as it's just _a very small number_ 🤷 Hard to sense check a model when the prior...

enhancement