dbt-project-evaluator icon indicating copy to clipboard operation
dbt-project-evaluator copied to clipboard

New Rules! for model governance (dbt Core v1.5)

Open jtcohen6 opened this issue 1 year ago • 8 comments

Describe the feature

https://dbt-labs.github.io/dbt-project-evaluator/0.5/rules/

I'd like to propose some New Rules, related to the model governance features that are new in v1.5.

  • Public Models Have Contracts: Every model with access: public also has contract: {enforced: true}
  • Public Models Are Documented: Every model with access: public has a top-level description and description for every column (this rule, but more)
  • Column Naming Convention: For columns in models with explicitly defined contracts, we can assert that columns' names and data types match our opinionated recommendations.
  • Exposures Public Parents: Exposures depend only on models with access: public (similar to this rule)
  • At Most Three Versions: There should be, at most, three versions of a given model: one latest, one old, one prerelease
  • Group Coverage: Calculate percentage of resources in groups. Groupable resources are everything except sources + exposures. Raise a warning about resources that aren't in groups, with an owner. (This will be a heavier lift at first; we should build tooling to make it easier!)
  • Model Access Ratio: Calculate ratios of model access: private, protected, public. My absolutely arbitrary guess at an ideal breakdown would be somewhere in the ballpark of 50/40/10%. Every group should have at least one private model.

Related to deprecation_date (https://github.com/dbt-labs/dbt-core/issues/7433), coming in v1.6:

  • All old model versions (< latest_version) must have a deprecation_date
  • All models with a deprecation_date earlier than today should be removed

Very open to thoughts & feedback on all of the above — and more ideas, of course :)

Who will this benefit?

Developers & maintainers of large & complex projects, who want to start taking advantage of v1.5 features for model governance

Are you interested in contributing this feature?

I can help with refining/brainstorming, &/or possibly contribute with some of my hacktime :)

jtcohen6 avatar Apr 23 '23 19:04 jtcohen6

Love all of these ideas and am excited to address new best practices for model contracts. Do you think these best practices fall into one of our pre-existing categories (modeling, testing, documentation, structure, performance) or does this call for a new category?

I think some of these fall neatly into modeling (such as Exposures Public Parents) - but for the others, I'm thinking about creating a new category for "governance".

graciegoheen avatar Apr 25 '23 20:04 graciegoheen

In my mind, most of them would fall on a new Governance category. Mostly because contracts/governance is quite opt-in and it would be great for people to activate/deactivate Governance rules if they want or not.

The one that I am less sure about is "Column Naming Convention". While it only really works when governance is in place and all columns are listed, this feels more about modelling that Governance to me.

b-per avatar Apr 26 '23 09:04 b-per

loving this list of new no nos:

we're gonna tackle this, likely in this priority order:

Definitely:

  1. Public Models Have Contracts: for sure!
  2. Public Models Are Documented: of course! - column level checks will be a slightly new construct here, but doing a model-level check for all columns have descriptions should be doable!
  3. Exposures Public Parents: deal! organzing all of these into a Governance folder will be key here, since those who don't adopt mesh stuff might get some noise on existing exposures

Maybe:

  1. Group Coverage: We're on board here, but given how little we know about how people will use this, not sure we have a clear opinionated stance on whether you should group everything
  2. At Most Three Versions: Similar to above -- likely a good check, but until we see folks using versions, may be hard to properly capture all the use cases in a single check
  3. Column Naming Convention: this one seems hard as hell! column naming is something that we have found to be the most flexible, and while there's value in checking for a standard within a project, our recommendations are definitely not the most common ones. The signal to noise ratio might be a bit low, and this would invite a lot of requests for overrides/customization
  4. Model Access Ratio: We agree that all the have are gut checks on what the right ratios would be 😅 we may create a similar "governance coverage" model like the test and documentation coverage models we have, and this could be a good candidate!

dave-connors-3 avatar Apr 27 '23 16:04 dave-connors-3

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] avatar Oct 25 '23 01:10 github-actions[bot]

@dave-connors-3 , you were looking at those at some points. Are you OK with closing this issue with the changes you did or do you think there is more work to be done?

b-per avatar Oct 25 '23 09:10 b-per

How about fct_public_models_without_exposures ?

In our teams, all public models have external access, and writing the exposures are sometimes missed. This new governance is useful to avoid the missing.

IMO, public models without exposures are rare, it should be into dbt_project_evaluator_exceptions.csv as exception or disable the rule. (maybe wrong)

Please tell us how do you think about it ..

sadahry avatar Dec 20 '23 08:12 sadahry

Hi @sadahry , thanks for adding this suggestion!

I don't know if your rule would apply to a majority of use cases.

In my mind, public models are about the ability to ref() those models from different dbt groups and/or projects, which is something different from "public" in the sense that those models are consumed outside of dbt by other people and tools.

Internally we define exposures on models that are not public and have public models without exposures.

What I would recommend instead here is for you to create your own custom model and test on top of the models the package is creating (you can check the existing models and tests for inspiration) and you can then run the project evaluator package with dbt build --select package:dbt_project_evaluator+ (with the + after) to pick up the default models/tests and your custom ones

b-per avatar Dec 20 '23 08:12 b-per

Understood. I'll create a custom model. Thanks!

sadahry avatar Dec 20 '23 09:12 sadahry