Robin Linacre
Robin Linacre
Some intiial experimentation with an LLM prompt: Click to expand ```txt # MAIN INSTRUCTION TO LLM # CONSIDER THIS EXAMPLE AND GIVE ME AN EXAMPLE OF HOW TO USE SQLGLOT...
Some further notes: - This specifically applies to predict() so we shouldn't need to touch much of the codebase. https://github.com/moj-analytical-services/splink/blob/8495fb75d2a68d15c11e713543a6fac2a450a982/splink/internals/comparison.py#L218 comparison._case_statement generates: ``` CASE WHEN "first_name_l" IS NULL OR "first_name_r"...
Check that in duckdb 1.2 case statements are not optimized ```python import time import duckdb num_rows = 10_000_000 con = duckdb.connect() # Start timing start_time = time.time() # Create empty...
An approach that seems to be working pretty well, need to figure out how to integrate ```python import sqlglot from sqlglot import exp from typing import Dict, Set import duckdb...
Interestingly Spark optimises this for the user. - you get exactly the same timings with or without the optimisation Details ``` # Run autoreload to prevent having to restart kernel...
see also https://github.com/duckdb/duckdb/discussions/16338
Closing as this optimisation is now included in duckdb 1.3.0+ See discussion in https://github.com/moj-analytical-services/splink/pull/2738
@omri374 we have had the same problem and have fixed it this way: https://github.com/moj-analytical-services/splink/pull/2033
Thanks! Looks reasonable to me, but Andy was the original author of this code will get his opinion first.. Might also solve #2341 @ADBond look ok to you?
Hi @vfrank66 thanks very much for this. We're a bit short staffed at the moment but will endeavour to review ASAP. Don't worry too much about the test failures for...