dbt-databricks icon indicating copy to clipboard operation
dbt-databricks copied to clipboard

Improve catalog performance

Open benc-db opened this issue 1 year ago • 0 comments
trafficstars

Description

After significant testing, I've found that getting column data from information_schema as currently implemented is too expensive. The fastest way I've found to populate the required data for the catalog is using show table extended. The tradeoff is that you will not get column comments that exist outside of your dbt project. In exchange, performance is on par or better than 1.6.x in my testing (and substantially better than 1.7.x), and we can turn back on the capability to retrieve metadata based on relations, rather than gathering everything in the schema. Real world performance may vary as a function of:

a.) HMS vs UC (though they should be fairly similar now). b.) The size of project c.) The number of tables that are present in schemas referenced by the project that are not themselves part of the project.

Checklist

  • [x] I have run this code in development and it appears to resolve the stated issue
  • [ ] This PR includes tests, or tests are not required/relevant for this PR
  • [ ] I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

benc-db avatar May 09 '24 23:05 benc-db