dbt-databricks
dbt-databricks copied to clipboard
rows_affected not returned by adapter
I am using the dbt artifacts package for monitoring ETL runs, dbt artifacts supports logging the number of rows affected but requires the adapter to return this information, currently it seems that the dbt-databricks adapter doesn't return this info in the result object hence the column is empty.
Is it possible to return this information for all materializations where data is persisted (atleast the table and incremental materializations?)
What would be the timeline for this feature to become available. Having insights in the rows_affected on persisted tables (materialization = table or incremental) is key to: -understand if the result of the model doesn't accidentally load incorrect/duplicated data due to incorrect logic in the model. -what data volume is processed and to identify growth in data volume over time
A possible solution direction (but maybe there is a smarter, more efficient and secure wat to accomplish)?
Databricks creates delta lake tables by default, delta lake tables preserve the history of CRUD operations executed on the table.
For a table materialization the number of rows affected can be fetched by performing the following steps: DESCRIBE HISTORY schema.table_name Get the last record where column operation = 'DESCRIBE HISTORY x239_int.campaign' Get the number of rows written from column operationMetrics -> numOfOutputRows
For an incremental materialization the number of rows affected can be retrieved the same way as for a table materialization but the the column operation should be equal to WRITE or MERGE depending if the incremental strategy is append, insert-overwrite or merge. The number of rows affected can be retrieved from the operationMetrics column as well.
The logic above should be executed directly after the execution of the SQL or Python code defined in the DBT model.
Hi,
Is there any chance that this feature gets planned in upcoming releases?
Thanks!
We are taking it into consideration for planning. If you would like to expedite, consider submitting a PR with an implementation :).
Hi @benc-db. Thanks for your feedback. Sorry if my comment sounded as a request -- it was not. Should I need it much sooner, I will definitely look into contributing! And thank you for the adapter, it is very useful.
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue.