Robin Linacre

Results 67 issues of Robin Linacre

Closes #7460. ## Fix The error comes about because the `title` property of the `x` encoding is further manipulated into a vega expression for the `tooltip` and `description` signals in...

DuckDB provides a way of registering a C++ extension. Would be good to have an example of e.g. the jaro winkler function to show how this is done.

In spark, we see poor parallelisation of this: SQL is : ``` 22/08/03 13:30:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using...

enhancement

It would be useful to have a page in the developers' guidance explaining how to build the documentation locally: ``` git clone https://github.com/moj-analytical-services/splink_demos.git demos/ mkdir docs/settingseditor curl https://raw.githubusercontent.com/moj-analytical-services/splink/splink3/splink/files/settings_jsonschema.json --output docs/settingseditor/settings_jsonschema.json...

Supposing a model refers to `dmeta_first_name` in `blocking_rules_to_generate_predictions`, but - does not use `dmeta_first_name` in any of the `comparison`s - does not include `dmeta_first_name` in `additional_columns_to_retain` Then Splink will fail...

Similar to [`by_sample_size`](https://github.com/moj-analytical-services/splink/blob/5b15b06b736a1a8431f3f81b030018218fea20e0/splink/cluster_studio.py#L203) but want to 'stratisfy' by cluster density

enhancement
can_wait_until_after_3.0_release

Another way of training m is to provide splink with a deterministic rule that is considered valid (i.e. it results in 100% matches), and use this to train the m...

[Reproducible example](https://github.com/moj-analytical-services/splink_demos/blob/f48aeaccf77389ceba03fa844526c1d7edfdad68/quickstart_demo_deduplication.ipynb) Particularly on small datasets, the proportion of matches is often greatly overestimated. You can see the calculations by turning on logging to level 15: ``` logging.basicConfig( format="%(message)s", )...

bug
enhancement

For example, if you want to register a sample of records to use `find_matches_to_new_records` In duckdb linker, at the moment have to do: ``` df_sample = df.sample(1) linker._con.register("__splink__df_new_records", df_sample) matches...

enhancement