dbt-databricks icon indicating copy to clipboard operation
dbt-databricks copied to clipboard

rows_affected not returned by adapter

Open mvdwielen opened this issue 2 years ago • 5 comments

I am using the dbt artifacts package for monitoring ETL runs, dbt artifacts supports logging the number of rows affected but requires the adapter to return this information, currently it seems that the dbt-databricks adapter doesn't return this info in the result object hence the column is empty.

Is it possible to return this information for all materializations where data is persisted (atleast the table and incremental materializations?)

mvdwielen avatar May 23 '23 12:05 mvdwielen

What would be the timeline for this feature to become available. Having insights in the rows_affected on persisted tables (materialization = table or incremental) is key to: -understand if the result of the model doesn't accidentally load incorrect/duplicated data due to incorrect logic in the model. -what data volume is processed and to identify growth in data volume over time

A possible solution direction (but maybe there is a smarter, more efficient and secure wat to accomplish)?

Databricks creates delta lake tables by default, delta lake tables preserve the history of CRUD operations executed on the table.

For a table materialization the number of rows affected can be fetched by performing the following steps: DESCRIBE HISTORY schema.table_name Get the last record where column operation = 'DESCRIBE HISTORY x239_int.campaign' Get the number of rows written from column operationMetrics -> numOfOutputRows

image

For an incremental materialization the number of rows affected can be retrieved the same way as for a table materialization but the the column operation should be equal to WRITE or MERGE depending if the incremental strategy is append, insert-overwrite or merge. The number of rows affected can be retrieved from the operationMetrics column as well.

The logic above should be executed directly after the execution of the SQL or Python code defined in the DBT model.

mvdwielen avatar Jun 22 '23 07:06 mvdwielen

Hi,

Is there any chance that this feature gets planned in upcoming releases?

Thanks!

gpodevijn avatar Sep 11 '23 12:09 gpodevijn

We are taking it into consideration for planning. If you would like to expedite, consider submitting a PR with an implementation :).

benc-db avatar Sep 13 '23 16:09 benc-db

Hi @benc-db. Thanks for your feedback. Sorry if my comment sounded as a request -- it was not. Should I need it much sooner, I will definitely look into contributing! And thank you for the adapter, it is very useful.

gpodevijn avatar Sep 14 '23 08:09 gpodevijn

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue.

github-actions[bot] avatar Apr 01 '24 01:04 github-actions[bot]