sagerx icon indicating copy to clipboard operation
sagerx copied to clipboard

Get RxNorm data models up to speed with dbt best practices

Open lprzychodzien opened this issue 1 year ago • 2 comments

Resolves #270

Explanation

Changes rxnorm staging and intermediate queries to using dbt's jinja table references.

Most important is that it sets intermediate models to materialize as tables (vs views).

Rationale

dbt mart models are made up of complex logic that should be captured in the intermediate models. The issues already that we ran into (issue #270 ) was that performing these aggregations for each query took 10+ minutes. Therefore, bringing this aggregation into a separated intermediate model to be materialized as a table is a good solution. The materialization of these intermediate tables can take awhile but will significantly speed up queries.

Additional work can be done to optimize the queries that build out intermediate models in the future.

Tests

  1. What testing did you do? dbt run --full-refresh
  2. Attach testing logs inside a summary block:
testing logs

lprzychodzien avatar Apr 28 '24 20:04 lprzychodzien

Man you did tons of work here. Great job. Fundamental question themes:

  1. Should we title ctes specifically to what they represent even if we are just pulling in rxnconso multiple times? Or should we have one rxnconso cte per model and use aliases and leave everything else the same? Or should we name them what they represent and pull the where clause into the cte? Or should we make like other "base" staging tables with them? Or should we just look for opportunities to reference these staging models from other int/stg models instead of going back to source all the time?
  2. I'm def referencing source tables from a lot of intermediate models - shame on me. I need to fix this.
  3. Minor formatting changes still needed to satisfy my OCD (lowercase SQL commands and tab indent in some places).
  4. Some ref's still in select from instead of within a cte first.
  5. Maybe we should build a dbt macro for the active / prescribable columns - prob in a separate issue.

Another major thing I will look at is taking a 50,000 foot view of the entire RxNorm data model - it's hard to see when zoomed in on a given intermediate model, but I think I re-wrote a lot of the code in other intermediate models inside of intermediate models. So I think there's an opportunity to ref intermediate models instead of re-writing that SQL.

jrlegrand avatar May 01 '24 16:05 jrlegrand

I'd like to start moving toward styling our SQL like the dbt styleguide.

https://docs.getdbt.com/best-practices/how-we-style/2-how-we-style-our-sql

jrlegrand avatar Jul 19 '24 02:07 jrlegrand