Dremio datalake engine support
I'm finding a way for the OMD project to rule over every tool's metadata and sort of monitors an extra-compact DataPlatform, this DP uses Dremio as its query execution engine for scalability with bigger datasets and ease of use (it connects well to a lot of data sources). However, OMD hasn't the connector to Dremio for metadata extract and monitoring.
Solution: A Dremio connector to OMD, which can be easily configured through minimal variables like host:port and usrn:pasw, ssl is a nice addon feature but is not essential for now.
Alternative: An external data cataloguing service like HiveDC, DynamoDB, Nessie (Dremio recommended),... Both OMD and Dremio uses this as Metadata monitor and tracking tool. However this exclude Dremio from OMD and bloats the infra a bit (another data solution to take care of).
I'm new to this Cloud Data Engineering thing, a bit suprised about how limited Dremio is, though the engine is still quite powerful.
@capoolebugchat are you interested in picking up this issue?
@TeddyCr I would love to help.
@TeddyCr yes, sorry about the extra late reply
Thanks @capoolebugchat I'll assign it to you then. We have some information about how to build a new connector here. Make sure to join our slack channel and the #contributor channel for any help.
@rogercezidio please check other connectors here for contributing we have many. 😊
Hi folks,
we did a simple implementation for a Dremio custom connector here: https://github.com/TIKI-Institut/openmetadata-dremio-connector. It can only scrap Metadata. It has no support for Query Usage, Profiling etc. We didn't find a possibility to implement that for a custom connector. For lineage we are simple using DBT at the moment.
We would appreciate any feedback. Is it possible that this will be integrated into OpenMetadata?
Hey @wobu we would recommend you to directly contribute the connector to the community. This will allow you to leverage the existing code base to implement support for Usage, Profiling, etc.
Here is a link with more information -> https://docs.open-metadata.org/latest/developers/contribute/developing-a-new-connector
We thought about it, and also tried it, but unfortunatley setting up the openmetadata project under windows OS wasn't easy :/ (WSL would maybe an option). So we decided to just start with a custom connector.
The CustomConnector is currently sufficient for us, so we won't provide a direct community integration in the near future until our investment in Openmetadata and Dremio increases.
hey @wobu , i hop you are doing well, i have a question about this connector drtemio with dremio , is it support lineage , like collecte lineage from dremio and show it in om , or if you have another solution , thank you .
hi @mohamed-alaoui6,
the connector itself doesn't support lineage. We currently use DBT for managing our Views / Models in Dremio. With DBT we then export the lineage to OM.
do you think that is possible to use this connector to export data from dremio into om , and then create a script that collect lineage from dremio whith it's api GET /api/v3/catalog/{id}/graph and then transfer that lineage to om via it's api to create relation between entity.
the API you mentioned is an Dremio Enterprise Feature only: https://docs.dremio.com/25.x/reference/api/catalog/lineage#lineage-attributes
nevertheless, the main problem would be extending the "Custom" connector to allow Lineage import. I don't know if this is possible by openmetadata itself, because AFAIK i only found custom connector examples which could only import metadata (and no lineage). The alternative would be integrating the connector directly into the repository and the ingestion package of openmetadata.
Other connectors like trino can handle the lineage and they don't need a dedicated API from trino / dremio. I guess the lineage will be extracted over the SQL definition over the Views.
I’ve already worked on this kind of integration with Spark. In my implementation, I first connected OpenMetadata with Hive to import the metadata (schemas, tables, etc.).
Then, I developed a custom Python script that:
Initializes a Spark session with the OpenLineage Spark agent enabled.
Executes a Hive-based Spark job (e.g., reading/joining Hive tables and writing results).
Captures the OpenLineage events into a local JSON file.
Uses a custom class (OpenMetadataLineageAgent) to:
Create/update the pipeline service and pipeline entities in OpenMetadata.
Trigger metadata ingestion to ensure tables are indexed.
Retrieve input/output entities from OpenMetadata using FQNs.
Create lineage edges via the OpenMetadata API by linking source and target tables to the pipeline.
This solution allowed me to capture lineage from Hive/Spark jobs and visualize it in OpenMetadata.
Do you think I could apply a similar approach for Dremio? Specifically:
Import Dremio metadata into OpenMetadata (OM) with that connector
Capture lineage information from Dremio using its API
Represent that lineage as input/output datasets in a json file
And then send this lineage data to OpenMetadata using a custom ingestion agent
I’m essentially trying to replicate the lineage ingestion logic I've implemented for Spark, but adapted for Dremio. Does this sound feasible
there are a lot of ways to do this :)
your approach would work with the requirement of having Dremio Enterprise.
Alternatively you could use the same approach as the openmetadata trino connector is doing it:
- Query the system table containing all executued queries of dremio: https://docs.dremio.com/current/reference/sql/system-tables/jobs
- use the existing lineage processing implementations of openmetadata
either way, when you implement it by yourself to have to start the lineage ingestion workflow externaly (because custom connectors won't allow a UI integration for this)
Thanks for the explanation, that makes sense. I’ll try it on my side and let you know if I have any updates.
Hi @wobu ,
I hope you're doing well.
I'm having an issue setting up the Dremio connector, and I was wondering if you could assist me.
This is the error I'm getting:
""
(venv) mohamed@ubuntu:~/dremio-connector/openmetadata-dremio-connector$ metadata ingest -c ./workflow.dremio.yaml
Traceback (most recent call last):
File "/home/mohamed/dremio-connector/openmetadata-dremio-connector/venv/bin/metadata", line 5, in
These are the steps I followed:
"" python3.12 -m venv /home/mohamed/dremio-connector/openmetadata-dremio-connector/venv source venv/bin/activate pip install -e .
Configured workflow.dremio.yaml
metadata ingest -c ./workflow.dremio.yaml ""
Let me know if you have any suggestions or if I'm missing something.
Thanks in advance!
@mohamed-alaoui6
try pip install . instead of pip install -e .
The editable mode somehow overrides the "metadata" module of openmetadata itself. Unfortunately I am no expert in this. So this is the only solution i've got
Thanks a lot, Mr. @wobu, it’s working!