ADBond
ADBond
Have looked into this, and appears to be resolved (thanks to updates in sqlglot/duckdb).
re: report/commit - I think I'm in favour of just having a pass/fail, rather than committing back changes. Been thinking about this for a while, and with autocommit: * fiddly...
For spark we define a [custom udf `DualArrayExplode`](https://github.com/moj-analytical-services/splink_scalaudfs/blob/904bc83807ffea4084fa81696c62a90f4031e5e5/src/main/scala/uk/gov/moj/dash/linkage/Similarity.scala#L207-L229) in the included `jar` which performs the Cartesian product of arrays. So the sort of SQL you would use would be, for...
This also contributes quite a bit of noise to the docs build logs (though is not the only culprit)
While the plugin is now functioning correctly, it is not running correctly in github actions, as currently not enough git history is available in the context in which it runs:...
No longer using this plugin with Splink 4 docs
Think it would be probably good to break this module up eventually anyhow, maybe something like database_api/ ├─ __init__.py # imports `DuckDBAPI`, `SparkAPI` etc ├─ database_api.py # core definition ├─...
Yeah, that's exactly the sort of thing I mean. The key thing is being able to create a `ComparisonLevelCreator` that can delay construction until we know the dialect, which we...
> I guess it could also be built into `LiteralMatchLevel` as an arg Yeah this is what I was envisaging - also has the advantage that we can handle dialect-dependence...
Haven't looked in great detail yet, but looks like between iterations the table `__splink__df_concat_with_tf` is not consistent between iterations. Then when we Bernoulli sample from this, because the rows are...