Robin Linacre
Robin Linacre
Has been split apart in Splink 4, closing
Other things that have come to mind: - Greater consistency in naming - We should refer to match weights as partial match weights wherever possible. E.g. at the moment, on...
Do we think there may be a way of getting the Linker object to ‘deal with’ the backends automatically. Suppose we instantiate all linkers like `linker = Linker(df, dialect='spark')` I’m...
re "'do away with' settings_dict, comparison_dict and comparison_level_dict." I think I pretty much agree with all of what you say I'm still pretty wedded to the concept of a formal...
This is a really interesting idea and I'd love to see how it pans out - would be amazing if Splink could leverage embeddings effectively. There's a rough and ready...
Nice - yeah, that def looks possible. This is pretty nasty, but it occurred to me that in dialects that don't support `zip_with` you could probably get to the same...
Also worth noting that the duckdb team have already included jaro winkler in direct response to a request from splink users, so it's possible they would consider adding cosine similarity...
This is also worth keeping an eye on: https://twitter.com/__AlexMonahan__/status/1621141511000961026 Might make it easier to write a udf (e.g. for cosine distance)
@mamonu has now created 'first try' at a cosine similarity function for the Spark backend which can be found here: https://github.com/moj-analytical-services/splink_scalaudfs/blob/embeddings/jars/scala-udf-similarity-0.1.1-EMBEDDINGSDEV.jar Here's a working Splink model using the above jar,...
@OlivierBinette you may be interested ☝️