dbt-project-evaluator
dbt-project-evaluator copied to clipboard
Documentation evaluation: column level and invalid
Describe the feature
The existing documentation checks only pick up undocumented models, but there are some extensions that would be useful.
- invalid documented models, e.g. where the documented model does not exist (perhaps because it has been removed or renamed but the documentation has not)
- undocumented columns, as per the existing model functionality but per column instead
- invalid documented columns, as per the "invalid documented models" suggestion above but per column instead
Per column level warnings are likely to be very verbose, so fine grained control over what is reported would be necessary, and defaulting the tests to be disabled would probably be appropriate - opt in, not opt out.
Describe alternatives you've considered
I have seen (although I can't find them now) cleverly constructed tests that handle the undocumented case in a very heavy way - IIRC the tests needed to be defined per model and/or per column and so would infer the non-existence of the model or column by running invalid queries.
I have not seen anything for identifying undocumented columns elsewhere.
Additional context
Presumably not DB specific.
Who will this benefit?
Same target audience as the other documentation checks, those wanting a good level of documentation coverage across their data structures.
Are you interested in contributing this feature?
I would, but have no idea where to start (more a JS dev than a python dev).
Hey @elyobo! Thanks for opening the issue!
for situations where there is documentation for models or columns that don't exist, I believe that dbt will throw a [WARNING]
message at the beginning of each dbt command. Is this the behavior you'd also want to build a check into this package for?
Column level documentation coverage is a really cool idea -- I don't think we currently have any checks built out at the column level at the moment. What would your ideal output of a model that checks this look like? A row per each column that is not documented? or one per model with a list of the undocumented columns?
cc: @graciegoheen for visibility!
Can confirm that models that don't exist are warned about (so that feature is probably redundant, unless wanted for consistency) but columns don't appear to be.
Example model warning below, no warning given for the this_column_does_not_exist
column I added a description to on a model that does exist but does not have that column.
23:47:39 [WARNING]: Did not find matching node for patch with name 'this_model_does_not_exist' in the 'models' section of file 'models/report/_report__models.yml'
I was thinking row per column, similar to undocumented models.
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.