Using Delta Live Tables Sink as a Custom Materialization in dbt
Describe the feature
We are using Databricks and would like to implement all data transformations—both streaming and batch—entirely within dbt. Currently, dbt supports Streaming Tables in Databricks, which internally create Delta Live Tables (DLT). Additionally, as per Databricks documentation, it is possible to write stream output to a Kafka topic using writeStream in a Delta Live Tables pipeline.
Describe alternatives you've considered
Additional context
The outlined approach includes:
- Setting up Kafka configurations (broker URL, topic, security settings)
- Creating a DLT pipeline
- Defining a streaming source (files, Delta tables, etc.)
- Using writeStream with Kafka options to publish the data
The introduction of the new Sinks API in DLT addresses the need to write processed data to external event streams, such as Apache Kafka and Azure Event Hubs, as well as writing to a Delta Table. These features are currently in Public Preview, with plans for further expansion.
Who will this benefit?
This feature will benefit users who are looking to integrate their data pipelines entirely within dbt and require seamless publishing of streaming data to external platforms such as Kafka and Azure Event Hubs. Specific use cases include real-time data processing and integration with external event streaming platforms for further analytics and monitoring.
Are you interested in contributing this feature?
--