Robin Linacre

Results 91 issues of Robin Linacre

We round probability to 5dp, meaning that no match probabilities above about a mw of 17 appear on the chart Changes needed: unlinkables.py ``` def unlinkables_data(linker: Linker) -> list[dict[str, Any]]:...

good first issue

In the codebase we often have to be careful when looking at the equality of two columns to ensure we're not comparing quoted to unquoted I wondered whether it may...

### Discussed in https://github.com/moj-analytical-services/splink/discussions/2526 Originally posted by **medwar99** November 27, 2024 ### Is your proposal related to a problem? The labelling tool defaults to showing the Splink predictions by default,...

============================== slowest durations =============================== 42.14s call tests/test_debug_mode.py::test_debug_mode_combined_training[spark] 21.49s call tests/test_debug_mode.py::test_debug_mode_ptrrm_train[spark] 20.24s call tests/test_analyse_blocking.py::test_analyse_blocking_slow_methodology[spark] 18.65s call tests/test_debug_mode.py::test_debug_mode_u_training[spark] 17.38s call tests/test_full_example_spark.py::test_full_example_spark 14.38s call tests/test_debug_mode.py::test_debug_mode_em_training[spark] 13.66s call tests/test_debug_mode.py::test_debug_mode_profile_columns[spark]

Following #2847, now that implicit cache has been removed, give user ability to manage cache explicitly ## Blocked ID pairs I have removed the `materialise_blocked_pairs: bool = True` flag on...

splink_5

Following https://github.com/moj-analytical-services/splink/pull/2850 The only fiddly bit here was implementing clamping to avoid floating point errors. It's actually much easier when working with match weights. The key parts are where we...

splink_5

Following https://github.com/moj-analytical-services/splink/pull/2849 Add support for chunking and make sure it integrates correctly with caching/table management

splink_5

Following https://github.com/moj-analytical-services/splink/pull/2848 Remove salting. Note, there's [a comment](https://github.com/moj-analytical-services/splink/blob/ea372bf667df3a1b8f5b413a50a59004b1c86a4a/splink/internals/settings.py#L646-L647) in the current code that indicates we need salting for duckdb to parallelise `linker.training.estimate_u_using_random_sampling`, I've double checked and this is no longer...

splink_5

- Do all calculations with additive match weights rather than multiplicative bayes factors see [here](https://github.com/moj-analytical-services/splink/issues/1889) - Somehow use the `splink_udfs` duckdb extension - Consider changing how real time linking works...

## Summary The `_input_columns` method in `splink/internals/linker.py` (lines 186-245) is only used in one place and can be replaced with simpler existing code. ## Current Usage The method is only...