Robin Linacre comments

Results 234 comments of


                                            Robin Linacre

[MAINT] `linker.py` cleanin/refactoring

Has been split apart in Splink 4, closing

Splink 4: Backwards-incompatible API changes

Other things that have come to mind: - Greater consistency in naming - We should refer to match weights as partial match weights wherever possible. E.g. at the moment, on...

Splink 4: Backwards-incompatible API changes

Do we think there may be a way of getting the Linker object to ‘deal with’ the backends automatically. Suppose we instantiate all linkers like `linker = Linker(df, dialect='spark')` I’m...

Splink 4: Backwards-incompatible API changes

re "'do away with' settings_dict, comparison_dict and comparison_level_dict." I think I pretty much agree with all of what you say I'm still pretty wedded to the concept of a formal...

[FEAT] Support for embedding-based similarity functions

This is a really interesting idea and I'd love to see how it pans out - would be amazing if Splink could leverage embeddings effectively. There's a rough and ready...

[FEAT] Support for embedding-based similarity functions

Nice - yeah, that def looks possible. This is pretty nasty, but it occurred to me that in dialects that don't support `zip_with` you could probably get to the same...

[FEAT] Support for embedding-based similarity functions

Also worth noting that the duckdb team have already included jaro winkler in direct response to a request from splink users, so it's possible they would consider adding cosine similarity...

[FEAT] Support for embedding-based similarity functions

This is also worth keeping an eye on: https://twitter.com/__AlexMonahan__/status/1621141511000961026 Might make it easier to write a udf (e.g. for cosine distance)

[FEAT] Support for embedding-based similarity functions

@mamonu has now created 'first try' at a cosine similarity function for the Spark backend which can be found here: https://github.com/moj-analytical-services/splink_scalaudfs/blob/embeddings/jars/scala-udf-similarity-0.1.1-EMBEDDINGSDEV.jar Here's a working Splink model using the above jar,...

[FEAT] Support for embedding-based similarity functions

@OlivierBinette you may be interested ☝️