dbt-core
dbt-core copied to clipboard
[SPIKE] Testing PR #6322 (build performance update)
This ticket encompasses the work necessary to review PR #6322 (related to #6073) and determine the following:
- Does this actually improve performance? The contributor claims some level of performance improvement, but we need proper benchmarks and the core team should be responsible for building said benchmarks-- especially since initial testing on smaller projects and synthetic benchmarking projects doesn't seem to show much of an improvement.
- Is this algorithm the right choice to use across the board? Does it have performance characteristics that fit everyone's needs?
Once this spike determines this work is suitable for what we need here we can work with the contributor to correct some code issues and add some testing. If it doesn't we should probably prioritize putting dbt engineering hours into this since it's obviously been an issue for some time and judging by slack conversations, there's a real paint point to be solved here.
This ticket ~may~ should also drive us to get better performance monitoring setup around the build
command.
Slack refs:
- https://dbt-labs.slack.com/archives/C04191CFLK0/p1673281639142599?thread_ts=1672911813.317039&cid=C04191CFLK0
- https://dbt-labs.slack.com/archives/C0131TY7EEA/p1673291456196899
Prior art for performance benchmarking + regression testing: https://github.com/dbt-labs/dbt-core/tree/main/performance
With partial parsing enabled, I believe we could use this top-level command to reliably isolate the time associated with compiling a DAG with "extra test edges": dbt build --exclude fqn:*
Scope of this ticket is just to decide if want to merge this PR or not. Additional foundational performance improvements are out of scope for now.
Using dbt's built-in timing info (thanks @dbeatty10!), I ran dbt --no-partial-parse compile
on a gnarly DAG sized 8468 models, 17103 tests. It took 14m4s to run on dbt-core#tag/v1.3.2 code, and 16m3s to run on dbt-core#tag/v1.3.2 with #6322 patch applied to dbt/graph/graph.py
.
Looking into the charts below, we see that indeed the changed code in #6322 seems to have added time.

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.