ADBond comments

Results 35 comments of


                                            ADBond

Unable to create `Comparison` for a function with a non-default schema

Have looked into this, and appears to be resolved (thanks to updates in sqlglot/duckdb).

(WIP) Switch to ruff

re: report/commit - I think I'm in favour of just having a pass/fail, rather than committing back changes. Been thinking about this for a while, and with autocommit: * fiddly...

[FEAT] Allow fuzzy matches on array-valued columns

For spark we define a [custom udf `DualArrayExplode`](https://github.com/moj-analytical-services/splink_scalaudfs/blob/904bc83807ffea4084fa81696c62a90f4031e5e5/src/main/scala/uk/gov/moj/dash/linkage/Similarity.scala#L207-L229) in the included `jar` which performs the Cartesian product of arrays. So the sort of SQL you would use would be, for...

Docs pages - page last changed time not correct

This also contributes quite a bit of noise to the docs build logs (though is not the only culprit)

Docs pages - page last changed time not correct

While the plugin is now functioning correctly, it is not running correctly in github actions, as currently not enough git history is available in the context in which it runs:...

Docs pages - page last changed time not correct

No longer using this plugin with Splink 4 docs

Splink4: database_api contains imports that are not installed by default (e.g. pyspark)

Think it would be probably good to break this module up eventually anyhow, maybe something like database_api/ ├─ __init__.py # imports `DuckDBAPI`, `SparkAPI` etc ├─ database_api.py # core definition ├─...

[Splink 4] DateComparison - 1st January matching not working

Yeah, that's exactly the sort of thing I mean. The key thing is being able to create a `ComparisonLevelCreator` that can delay construction until we know the dialect, which we...

[Splink 4] DateComparison - 1st January matching not working

> I guess it could also be built into `LiteralMatchLevel` as an arg Yeah this is what I was envisaging - also has the advantage that we can handle dialect-dependence...

[BUG] `seed` parameter not working in `linker.estimate_u_using_random_sampling()`

Haven't looked in great detail yet, but looks like between iterations the table `__splink__df_concat_with_tf` is not consistent between iterations. Then when we Bernoulli sample from this, because the rows are...