splink
splink copied to clipboard
Consistent date difference level across backends
Datediff
measures boundary partitions, so e.g. '30th Jan to 1st Feb is 1 month, but so is 1st Jan to 28th Feb'. Currently there is discrepancy across backends in DatediffLevel
which do things like this (duckdb), and those which treat year/month as 'time intervals' (spark, postgres).
Arguably the 'boundary partition' approach is not that useful compared to 'fixed intervals', which is perhaps more intuitive.
Proposal is to replace DatediffLevel
with a DateDifferenceLevel
(changing name to make it clear this does not align with 'datediff' semantics), which behaves consistently across backends, and essentially has 1 year == 12 months == 30 days (modulo some details on those specific numbers). True datediff
behaviour using boundary partitions will still be available using custom SQL for any users that may still require this.
Yep - fully support this, and should make the implementation simpler as well!
Closed by #1940