dbt-core icon indicating copy to clipboard operation
dbt-core copied to clipboard

[CT-1182] [Feature] dbt_metrics should be part of Core

Open callum-mcdata opened this issue 2 years ago • 4 comments

Is this your first time submitting a feature request?

  • [X] I have read the expectations for open source contributors
  • [X] I have searched the existing issues, and I could not find an existing issue for this feature
  • [X] I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

I want to clarify that this is a rough idea but came to mind due to version errors

As we begin to move forward with the semantic layer and working with partners, the dependency of core and metrics has been very clear.

Advantages

  • Tightly coupling metrics and core will only become more necessary as we begin adding more semantic capabilities. Having this occur sooner, rather than later, would get us on that path.
  • Metrics macros in core create ease of use for metrics / semantic layer because package dependencies are reduced.
  • Debugging metrics becomes easier because we now only need to ask for the single version instead of both core and package.

Disadvantages

  • Reduced speed of iteration due to the slower release cycles of dbt-core. This could be partially resolved with patch releases for non-breaking changes but could potentially break core's practice on patch not containing new features?
  • Work? I have no idea whether this is easy or hard.
  • Backwards compatibility. We'd want to make sure that however people are querying metrics today they can use the same syntax tomorrow.

Describe alternatives you've considered

Keeping them separate.

Who will this benefit?

Anyone who is using metrics

Are you interested in contributing this feature?

Definitely.

Anything else?

No response

callum-mcdata avatar Sep 14 '22 17:09 callum-mcdata

Thanks for stating the discussion @callum-mcdata! (And an issue for this feels right, since it tracks real work.)

Reduced speed of iteration due to the slower release cycles of dbt-core. This could be partially resolved with patch releases for non-breaking changes but could potentially break core's practice on patch not containing new features?

IMO this is the most salient reason why we kicked off this in a package, instead of in dbt-core. We wanted rapid iteration during the "experimental" period! Now that we're gearing up for dbt Semantic Layer launch, I think it's appropriate to stabilize the metric spec and ask where the foundational logic for it should live.

Adding some thoughts of my own:

  • Accessibility: By keeping the code in a separate dbt-metrics package, which is entirely "user-space" code (SQL + Jinja), we'd also be keeping it more accessible to contribution by end users of dbt who might have a harder time contributing to dbt-core. (We should aim to make dbt-core contribution as accessible as possible—but there's inevitably more stuff at play here.)
  • Focus: A separate repo also provides us a focused place for issues and discussions that are specific to how metrics are calculated. As we've seen this year, some of those issues might turn into issues for dbt-core, but certainly not all — I'd guess ~a third.
  • Stability: A change to the metric spec requires PRs in multiple repos (dbt-core + dbt-metrics), rather than just a coordinated change in one repo. I think this is a good thing! It adds some healthy friction, that slows us down before introducing changes that will require migrations for users and partner integrations (even with backwards compatibility to ease that migration). We need to stop and think before we adjust the metric spec, because we're really changing an API. This is a new field to document, a new field that might want to be exposed in the metadata API...
  • Easier migrations? That being said, in cases where we do need to coordinate changes across dbt-core + dbt-metrics: The good news is that, starting in v1.3+, dbt deps will start respecting the require-dbt-version config specified in packages (https://github.com/dbt-labs/dbt-core/pull/5651). So if we make changes to dbt-metrics that require v1.4, people running v1.3 won't automatically install it — they'll keep installing an older version. This doesn't totally prevent us from still landing in dependency hell, but it does provide users with a gentler migration pathway, one that doesn't require us posting "please pin your version!!"
  • Signalling maturity: If we choose to keep the dbt-metrics package separate, we should seriously consider bumping the package version to v1.0 — both as a signal of maturity, and a guarantee of backward compatibility — alongside the semantic layer launch. If we were to move the code into dbt-core, which is already major-version one, we'd have to feel comfortable making the same guarantee.
  • Adapter-specific code: I do think the right place for this to live is in dbt-core + adapter plugins, wherever we can abstract away cross-database utilities / building blocks that enable dbt-metrics macros to be platform-agnostic. This allows folks on other data platforms to use dbt-metrics, without requiring a complex shim/dispatch setup. (How much of this code is there? With some basic searching, I could only find one instance of a non-default / adapter-specific macro implementation: redshift__metric_average

Having typed that all out, I think I'd be in favor of:

  • keeping most of dbt-metrics as a separate package
  • refactoring lower-level functionality, and reimplementing inside dbt-core (e.g. a future path for https://github.com/dbt-labs/dbt-core/issues/5734)
  • moving adapter-specific macros we could distribute within adapter plugins, I'd be in favor of doing those things too.

But I am always open to being convinced otherwise!

jtcohen6 avatar Sep 15 '22 07:09 jtcohen6

We might hit the same problem outlined in this issue if we do this so we should be aware: https://github.com/dbt-labs/dbt-core/issues/5720#issuecomment-1230747304

leahwicz avatar Sep 15 '22 17:09 leahwicz

Great conversation on this live with the Core/DX team!

@Fleid made a good point: If metric compilation is "core" functionality, it should be part of "the core install"

These are separate questions:

  • Where should the code live: in dbt-core (this repo) or separate codebase (dbt package/plugin)
  • How should the code be distributed — opt-in, or by default in all dbt Core installations v1.3+? Maybe by default for everyone!

Wacky idea: Could we add default packages / add packages.yml to the "global" project ?? (This doesn't work today, but I think it would be simple to do)

jtcohen6 avatar Sep 20 '22 19:09 jtcohen6

Coming from my experience with all the semantic layer integrations, the metrics package is a core functionality because it is the only way we can interact with metrics. I recently questioned Cam about how one expects the user to test metrics and his first response was using the metrics calculate macro. If we want a user to follow our best practices (develop, test, document) right off the bat, we shouldn't put installing a package as an obstacle. To leave it out feels like telling the story only halfway.

That said, I don't know the appetite for putting in something that will likely have breaking changes and need quick iteration. I don't think we will be done with the breaking changes as we are continually pushed on what metrics can and should do with the semantic layer.

Side note: My concern with default packages.yml is a good deal of our packages only cover the 3 major data platforms we are used on :) But it might be cool if we did it on an adapter level.

amychen1776 avatar Sep 22 '22 12:09 amychen1776

Coming back to this issue now that we've had time to give it some thought. The overwhelming feedback from both internal stakeholders and beta customers has led us to be convinced that dbt_metrics should be part of dbt-core. A lot of it stems from that fact that we're positioning metrics as an intrinsic part of dbt, instead of an additional feature that can be added on at will. To @amychen1776 's point, the package is the only way to interact with metrics so it feels onerous to users to have to maintain dependencies for something they view as tightly coupled.

@cafzal might have a different opinion here but I would consider this a must have in one of the next three minor version releases (assuming launch timelines for the semantic layer remain the same). However, I think keeping it as a package as long as we can allows us to balance community feedback & fast releases with developer experience.

That out of the way, how to bundle it all together!

Ideal dream state that is probably a pipe dream

  • Paired with dbt-core: when you install core you also install the metric querying logic
  • Patch separate from core: we could release patch fixes to this codebase without having to release a new patch for core.
  • Adapter flexible: adapter maintainers could override components for their specific syntax. We'd want to have tests that ensure specific values regardless of adapter
  • Integrated CI: any changes to core would be tested against this functionality and any changes to this functionality would be tested against core.

Proposed Solutions

  • Default Package: I share the same concern that Amy has around default packages. Introducing the concept of default packages for adapters could be a cool workaround but I worry that we'd still run into the same dependency issues 🤔
  • Plugin: @jtcohen6 can you speak a bit more to what this might look like? Does this give us any more flexibility than keeping the code directly within dbt-core?

callum-mcdata avatar Oct 14 '22 15:10 callum-mcdata

@callum-mcdata Agree with the requirements laid out in your pipe dream!

  • Paired with dbt-core: when you install core you also install the metric querying logic

This is the biggest pain point that we're looking to solve for. Top priority of any solution.

  • Patch separate from core: we could release patch fixes to this codebase without having to release a new patch for core.

For the reasons mentioned above, I'd like to keep the metrics compilation logic in a separate repository if possible. This provides us better visibility on issues and PRs, allows for a different set of maintainers / reviewers, and gives us the ability to patch dbt-metrics independently of dbt-core. (Understanding that, if it's paired with dbt-core during installation/distribution, the vast majority of users are going to get the latest compatible patch by default.)

  • Adapter flexible: adapter maintainers could override components for their specific syntax. We'd want to have tests that ensure specific values regardless of adapter

Any differences in SQL syntax should be abstracted away into dispatched macros, and ideally solved via utilities defined in dbt-core + adapter plugins. Last time I checked, there were very few of these in dbt-metrics as it is.

  • Integrated CI: any changes to core would be tested against this functionality and any changes to this functionality would be tested against core.

Here's how we solve for this in adapter plugins today:

  • dbt-core runs its tests in CI using dbt-postgres, which happens to be located in this repository, but doesn't strictly need to be (https://github.com/dbt-labs/dbt-core/issues/6189)
  • Every adapter repo installs latest changes from dbt-core, and runs its own tests in CI
  • We run all the tests, everywhere, on all "supported" release branches (= latest patches of minor versions released in the past 12 months), twice per day

To my mind, tests on metrics compilation code would be like our tests on adapter plugin — we'd run them in CI on any changes to dbt-metrics, and twice a day against dbt-core latest main, to make sure we haven't made dbt-core changes that accidentally break them


  • Default Package: I share the same concern that Amy has around default packages. Introducing the concept of default packages for adapters could be a cool workaround but I worry that we'd still run into the same dependency issues 🤔
  • Plugin: @jtcohen6 can you speak a bit more to what this might look like? Does this give us any more flexibility than keeping the code directly within dbt-core?

@emmyoop's proposal at breakfast the other week: rather than thinking about dbt-metrics as a "dbt package" (installed by dbt deps), we could start thinking about it as a PyPI package, versioned and installed like adapter plugins. Each adapter plugin already includes a dbt project, following a specific path pattern: dbt/include/<adaptername>. Those "internal"/"include" packages are then loaded up, at the same point where dbt is looking in the dbt_pacakages folder for any additional packages that were put there by dbt deps:

https://github.com/dbt-labs/dbt-core/blob/be4a91a0fe35a619587b7a0145e190690e3771c6/core/dbt/config/runtime.py#L346

One caveat: "internal" packages from adapter plugins all share the global dbt namespace. Should metrics queries move into the global dbt namespace, or should we retain the separate metrics namespace? The global namespace gets bigger and more unwieldy every release, so the latter would be my preference.


It seems like we could:

  • Turn dbt-metrics from a valid "dbt package," installed from the Hub, into a valid Python package, installed from PyPI — even though there's no actual Python code in dbt-metrics! — we'd just need to add some setuptools-y metadata that bundles up the various .sql and .yml files
  • Either add dbt-metrics to the installation requirements of dbt-core, or just always install dbt-metrics in our managed deployment environments (dbt Cloud)
  • Release dbt-metrics in minor-version lockstep with dbt-core for now. Patches can be independent. More complex versioning can come later on, as we move toward major-version compatibility for plugins.

It's worth sense-checking this with other folks, to make sure we're not recommending a pattern that we'll regret later. I've added this issue to the v1.4 milestone, not because we absolutely must resolve it by the v1.4 release (January), but to make sure we're talking about it!

jtcohen6 avatar Nov 01 '22 16:11 jtcohen6

I like the idea @jtcohen6 laid out here with dbt-metrics becoming its own Python package but is bundled with dbt-core as a dependency when we release (or if we just care about Cloud, we have it on the Cloud images specifically). Keeping it in it's own repo is preferred for the reasons Jeremy explained which will make maintaining it and also maintaining dbt-core much easier

leahwicz avatar Nov 01 '22 17:11 leahwicz