dbt-duckdb feat(duckdb): add microbatch support for incremental strategies

Added version-aware support for MERGE statements in DuckDB adapter for versions 1.4.0 and above. Introduced _supports_merge property to check and cache version compatibility, enabling "merge" and "microbatch" incremental strategies alongside existing "append" and "delete+insert". Updated SQL macros to implement microbatch and merge logic for efficient batch processing and upserts.

Oct 08 '25 13:10 gfrmin

Ah @gfrmin thanks for this-- we also have #605 that adds support for the merge strategy and has a richer set of configs for it afaict.

I don't normally need to rectify two different impls of the same feature, happy to discuss this in the dbt Slack or DuckDB discord

Oct 08 '25 16:10 jwills

Oh wow, thanks for the tip! Will join the Slack to deduplicate efforts :)

Oct 08 '25 16:10 gfrmin

@gfrmin something unhappy here in this latest run: https://github.com/duckdb/dbt-duckdb/actions/runs/18803285639/job/53661107934?pr=644

Oct 25 '25 18:10 jwills

I updated how timestamps in the fixtures are defined to overcome 'timestamp is not UTC' errors

Oct 29 '25 15:10 gfrmin

@gfrmin okay, 605 is in-- you should be good to rebase on main and then we can just focus on the microbatch stuff!

Oct 30 '25 18:10 jwills

Hey! Great effort already. I have a couple of thoughts on this after trying it out on the clickbench dataset and seeing some big differences in performance and out-of-memory issues. The current microbatch implementation uses the new merge strategy, but falls back to delete+insert when <=1.4.0. However, those strategies are pretty different. The delete+insert does where ({{ unique_key }}) in (select ({{ unique_key }}) from {{ source }}) on a non-composite key for deletion and then inserts. Whereas the merge materialization matches the predicates (similar to composite keys on delete+insert).

From a performance perspective I think we should think about microbatching like this:

batches get inserted by date part (hour/day/month/etc)
row groups are optimised by this date part without explicit ordering because of the insertion time
deleting existing row groups and replacing them should be fast because the query planner can prune on something like date_trunc(event_time, 'day') based on the microbatching configuration

This has the added benefit that when ducklake is fully supported, you can do actual partition overwrites on the underlying folders.

I'd be happy to contribute to the changes, but to conclude, I think we should infer the date part to delete and have a separate microbatch materialization that can be adjusted when e.g. ducklake support arrives instead of now using either merge or delete+insert.

Additionally I think the delete+insert strategy is now actually so close to the default dbt delete+insert strategy that we might as well fall back to that and have one less incremental strategy to maintain.

Dec 12 '25 11:12 dumkydewilde

I spent a bit of time today trying to understand how the dbt microbatching process affects duckdb with different configurations, but I'm honestly struggling to get it to work in a decent way. If I have limited memory (e.g. 8GB) I constantly get OOM errors. That would make the microbatch strategy unusable for me, unless it would be on e.g. ducklake.

If anyone wants to replicate some of the tests I'm running, here's the repo I used to easily create a few tests based on clickbench data. These are the results on my end:

model_name	target_name	model_status	model_runtime_s
table	incremental_multi_thread	success	25.070116
incremental__del_ins_date_partition	incremental_multi_thread	success	81.30077
incremental__del_ins_ukey_date	incremental_multi_thread	success	87.08315
incremental__del_ins_ukey	incremental_multi_thread	success	93.9573
microbatch_default	incremental_single_thread	success	204.85132
microbatch_ukey	incremental_multi_thread	partial success	87.09769
microbatch_default_concurrent	incremental_multi_thread	partial success	93.76382
microbatch_default	incremental_multi_thread	partial success	95.302986
microbatch_ukey_date_partition	incremental_multi_thread	partial success	101.5172
microbatch_ukey	incremental_single_thread	partial success	220.9444
microbatch_ukey_date_partition	incremental_single_thread	error	1.297792
microbatch_date_partition	incremental_multi_thread	error	9.356611
microbatch_date_partition	incremental_single_thread	error	13.908075
incremental__merge	incremental_multi_thread	error	103.943726
incremental__merge_update_columns	incremental_multi_thread	error	109.29556
incremental__merge_ukey_date	incremental_multi_thread	error	114.55945

Dec 19 '25 15:12 dumkydewilde

Hi @gfrmin, @jwills, sorry for the duplication, but I unfortunately cannot push to either this PR as a maintainer or @gfrmin's branch, so I've created a separate PR to show my approach for microbatching with DuckDB in #681. The changes are all in the microbatch.sql materialization. This prevents OOM errors, and I think it'll be more robust when adding support for ducklake in the future. Let me know what you think! 🙏

Dec 23 '25 14:12 dumkydewilde