dbt-core
dbt-core copied to clipboard
[CT-1182] [Feature] dbt_metrics should be part of Core
Is this your first time submitting a feature request?
- [X] I have read the expectations for open source contributors
- [X] I have searched the existing issues, and I could not find an existing issue for this feature
- [X] I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion
Describe the feature
I want to clarify that this is a rough idea but came to mind due to version errors
As we begin to move forward with the semantic layer and working with partners, the dependency of core and metrics has been very clear.
Advantages
- Tightly coupling metrics and core will only become more necessary as we begin adding more semantic capabilities. Having this occur sooner, rather than later, would get us on that path.
- Metrics macros in core create ease of use for metrics / semantic layer because package dependencies are reduced.
- Debugging metrics becomes easier because we now only need to ask for the single version instead of both core and package.
Disadvantages
- Reduced speed of iteration due to the slower release cycles of dbt-core. This could be partially resolved with patch releases for non-breaking changes but could potentially break core's practice on patch not containing new features?
- Work? I have no idea whether this is easy or hard.
- Backwards compatibility. We'd want to make sure that however people are querying metrics today they can use the same syntax tomorrow.
Describe alternatives you've considered
Keeping them separate.
Who will this benefit?
Anyone who is using metrics
Are you interested in contributing this feature?
Definitely.
Anything else?
No response
Thanks for stating the discussion @callum-mcdata! (And an issue for this feels right, since it tracks real work.)
Reduced speed of iteration due to the slower release cycles of dbt-core. This could be partially resolved with patch releases for non-breaking changes but could potentially break core's practice on patch not containing new features?
IMO this is the most salient reason why we kicked off this in a package, instead of in dbt-core
. We wanted rapid iteration during the "experimental" period! Now that we're gearing up for dbt Semantic Layer launch, I think it's appropriate to stabilize the metric spec and ask where the foundational logic for it should live.
Adding some thoughts of my own:
-
Accessibility: By keeping the code in a separate
dbt-metrics
package, which is entirely "user-space" code (SQL + Jinja), we'd also be keeping it more accessible to contribution by end users of dbt who might have a harder time contributing todbt-core
. (We should aim to makedbt-core
contribution as accessible as possible—but there's inevitably more stuff at play here.) -
Focus: A separate repo also provides us a focused place for issues and discussions that are specific to how metrics are calculated. As we've seen this year, some of those issues might turn into issues for
dbt-core
, but certainly not all — I'd guess ~a third. -
Stability: A change to the metric spec requires PRs in multiple repos (
dbt-core
+dbt-metrics
), rather than just a coordinated change in one repo. I think this is a good thing! It adds some healthy friction, that slows us down before introducing changes that will require migrations for users and partner integrations (even with backwards compatibility to ease that migration). We need to stop and think before we adjust the metric spec, because we're really changing an API. This is a new field to document, a new field that might want to be exposed in the metadata API... -
Easier migrations? That being said, in cases where we do need to coordinate changes across
dbt-core
+dbt-metrics
: The good news is that, starting in v1.3+,dbt deps
will start respecting therequire-dbt-version
config specified in packages (https://github.com/dbt-labs/dbt-core/pull/5651). So if we make changes todbt-metrics
that require v1.4, people running v1.3 won't automatically install it — they'll keep installing an older version. This doesn't totally prevent us from still landing in dependency hell, but it does provide users with a gentler migration pathway, one that doesn't require us posting "please pin your version!!" -
Signalling maturity: If we choose to keep the
dbt-metrics
package separate, we should seriously consider bumping the package version to v1.0 — both as a signal of maturity, and a guarantee of backward compatibility — alongside the semantic layer launch. If we were to move the code intodbt-core
, which is already major-version one, we'd have to feel comfortable making the same guarantee. -
Adapter-specific code: I do think the right place for this to live is in
dbt-core
+ adapter plugins, wherever we can abstract away cross-database utilities / building blocks that enabledbt-metrics
macros to be platform-agnostic. This allows folks on other data platforms to usedbt-metrics
, without requiring a complex shim/dispatch setup. (How much of this code is there? With some basic searching, I could only find one instance of a non-default / adapter-specific macro implementation:redshift__metric_average
Having typed that all out, I think I'd be in favor of:
- keeping most of
dbt-metrics
as a separate package - refactoring lower-level functionality, and reimplementing inside
dbt-core
(e.g. a future path for https://github.com/dbt-labs/dbt-core/issues/5734) - moving adapter-specific macros we could distribute within adapter plugins, I'd be in favor of doing those things too.
But I am always open to being convinced otherwise!
We might hit the same problem outlined in this issue if we do this so we should be aware: https://github.com/dbt-labs/dbt-core/issues/5720#issuecomment-1230747304
Great conversation on this live with the Core/DX team!
@Fleid made a good point: If metric compilation is "core" functionality, it should be part of "the core install"
These are separate questions:
- Where should the code live: in
dbt-core
(this repo) or separate codebase (dbt package/plugin) - How should the code be distributed — opt-in, or by default in all dbt Core installations v1.3+? Maybe by default for everyone!
Wacky idea: Could we add default packages / add packages.yml
to the "global" project ?? (This doesn't work today, but I think it would be simple to do)
Coming from my experience with all the semantic layer integrations, the metrics package is a core functionality because it is the only way we can interact with metrics. I recently questioned Cam about how one expects the user to test metrics and his first response was using the metrics calculate macro. If we want a user to follow our best practices (develop, test, document) right off the bat, we shouldn't put installing a package as an obstacle. To leave it out feels like telling the story only halfway.
That said, I don't know the appetite for putting in something that will likely have breaking changes and need quick iteration. I don't think we will be done with the breaking changes as we are continually pushed on what metrics can and should do with the semantic layer.
Side note: My concern with default packages.yml is a good deal of our packages only cover the 3 major data platforms we are used on :) But it might be cool if we did it on an adapter level.
Coming back to this issue now that we've had time to give it some thought. The overwhelming feedback from both internal stakeholders and beta customers has led us to be convinced that dbt_metrics
should be part of dbt-core
. A lot of it stems from that fact that we're positioning metrics as an intrinsic part of dbt, instead of an additional feature that can be added on at will. To @amychen1776 's point, the package is the only way to interact with metrics so it feels onerous to users to have to maintain dependencies for something they view as tightly coupled.
@cafzal might have a different opinion here but I would consider this a must have in one of the next three minor version releases (assuming launch timelines for the semantic layer remain the same). However, I think keeping it as a package as long as we can allows us to balance community feedback & fast releases with developer experience.
That out of the way, how to bundle it all together!
Ideal dream state that is probably a pipe dream
- Paired with dbt-core: when you install core you also install the metric querying logic
- Patch separate from core: we could release patch fixes to this codebase without having to release a new patch for core.
- Adapter flexible: adapter maintainers could override components for their specific syntax. We'd want to have tests that ensure specific values regardless of adapter
- Integrated CI: any changes to core would be tested against this functionality and any changes to this functionality would be tested against core.
Proposed Solutions
- Default Package: I share the same concern that Amy has around default packages. Introducing the concept of default packages for adapters could be a cool workaround but I worry that we'd still run into the same dependency issues 🤔
- Plugin: @jtcohen6 can you speak a bit more to what this might look like? Does this give us any more flexibility than keeping the code directly within dbt-core?
@callum-mcdata Agree with the requirements laid out in your pipe dream!
- Paired with dbt-core: when you install core you also install the metric querying logic
This is the biggest pain point that we're looking to solve for. Top priority of any solution.
- Patch separate from core: we could release patch fixes to this codebase without having to release a new patch for core.
For the reasons mentioned above, I'd like to keep the metrics compilation logic in a separate repository if possible. This provides us better visibility on issues and PRs, allows for a different set of maintainers / reviewers, and gives us the ability to patch dbt-metrics
independently of dbt-core
. (Understanding that, if it's paired with dbt-core
during installation/distribution, the vast majority of users are going to get the latest compatible patch by default.)
- Adapter flexible: adapter maintainers could override components for their specific syntax. We'd want to have tests that ensure specific values regardless of adapter
Any differences in SQL syntax should be abstracted away into dispatched macros, and ideally solved via utilities defined in dbt-core
+ adapter plugins. Last time I checked, there were very few of these in dbt-metrics
as it is.
- Integrated CI: any changes to core would be tested against this functionality and any changes to this functionality would be tested against core.
Here's how we solve for this in adapter plugins today:
-
dbt-core
runs its tests in CI usingdbt-postgres
, which happens to be located in this repository, but doesn't strictly need to be (https://github.com/dbt-labs/dbt-core/issues/6189) - Every adapter repo installs latest changes from
dbt-core
, and runs its own tests in CI - We run all the tests, everywhere, on all "supported" release branches (= latest patches of minor versions released in the past 12 months), twice per day
To my mind, tests on metrics compilation code would be like our tests on adapter plugin — we'd run them in CI on any changes to dbt-metrics
, and twice a day against dbt-core
latest main
, to make sure we haven't made dbt-core
changes that accidentally break them
- Default Package: I share the same concern that Amy has around default packages. Introducing the concept of default packages for adapters could be a cool workaround but I worry that we'd still run into the same dependency issues 🤔
- Plugin: @jtcohen6 can you speak a bit more to what this might look like? Does this give us any more flexibility than keeping the code directly within dbt-core?
@emmyoop's proposal at breakfast the other week: rather than thinking about dbt-metrics
as a "dbt package" (installed by dbt deps
), we could start thinking about it as a PyPI package, versioned and installed like adapter plugins. Each adapter plugin already includes a dbt project, following a specific path pattern: dbt/include/<adaptername>
. Those "internal"/"include" packages are then loaded up, at the same point where dbt is looking in the dbt_pacakages
folder for any additional packages that were put there by dbt deps
:
https://github.com/dbt-labs/dbt-core/blob/be4a91a0fe35a619587b7a0145e190690e3771c6/core/dbt/config/runtime.py#L346
One caveat: "internal" packages from adapter plugins all share the global dbt
namespace. Should metrics queries move into the global dbt
namespace, or should we retain the separate metrics
namespace? The global namespace gets bigger and more unwieldy every release, so the latter would be my preference.
It seems like we could:
- Turn
dbt-metrics
from a valid "dbt package," installed from the Hub, into a valid Python package, installed from PyPI — even though there's no actual Python code indbt-metrics
! — we'd just need to add some setuptools-y metadata that bundles up the various.sql
and.yml
files - Either add
dbt-metrics
to the installation requirements ofdbt-core
, or just always installdbt-metrics
in our managed deployment environments (dbt Cloud) - Release
dbt-metrics
in minor-version lockstep withdbt-core
for now. Patches can be independent. More complex versioning can come later on, as we move toward major-version compatibility for plugins.
It's worth sense-checking this with other folks, to make sure we're not recommending a pattern that we'll regret later. I've added this issue to the v1.4 milestone, not because we absolutely must resolve it by the v1.4 release (January), but to make sure we're talking about it!
I like the idea @jtcohen6 laid out here with dbt-metrics
becoming its own Python package but is bundled with dbt-core
as a dependency when we release (or if we just care about Cloud, we have it on the Cloud images specifically). Keeping it in it's own repo is preferred for the reasons Jeremy explained which will make maintaining it and also maintaining dbt-core
much easier