OpenMetadata icon indicating copy to clipboard operation
OpenMetadata copied to clipboard

Dremio datalake engine support

Open capoolebugchat opened this issue 1 year ago • 5 comments

I'm finding a way for the OMD project to rule over every tool's metadata and sort of monitors an extra-compact DataPlatform, this DP uses Dremio as its query execution engine for scalability with bigger datasets and ease of use (it connects well to a lot of data sources). However, OMD hasn't the connector to Dremio for metadata extract and monitoring.

Solution: A Dremio connector to OMD, which can be easily configured through minimal variables like host:port and usrn:pasw, ssl is a nice addon feature but is not essential for now.

Alternative: An external data cataloguing service like HiveDC, DynamoDB, Nessie (Dremio recommended),... Both OMD and Dremio uses this as Metadata monitor and tracking tool. However this exclude Dremio from OMD and bloats the infra a bit (another data solution to take care of).

I'm new to this Cloud Data Engineering thing, a bit suprised about how limited Dremio is, though the engine is still quite powerful.

capoolebugchat avatar Mar 21 '24 09:03 capoolebugchat

@capoolebugchat are you interested in picking up this issue?

TeddyCr avatar Mar 22 '24 16:03 TeddyCr

@TeddyCr I would love to help.

rogercezidio avatar Apr 03 '24 13:04 rogercezidio

@TeddyCr yes, sorry about the extra late reply

capoolebugchat avatar Apr 03 '24 14:04 capoolebugchat

Thanks @capoolebugchat I'll assign it to you then. We have some information about how to build a new connector here. Make sure to join our slack channel and the #contributor channel for any help.

@rogercezidio please check other connectors here for contributing we have many. 😊

TeddyCr avatar Apr 04 '24 11:04 TeddyCr

Hi folks,

we did a simple implementation for a Dremio custom connector here: https://github.com/TIKI-Institut/openmetadata-dremio-connector. It can only scrap Metadata. It has no support for Query Usage, Profiling etc. We didn't find a possibility to implement that for a custom connector. For lineage we are simple using DBT at the moment.

We would appreciate any feedback. Is it possible that this will be integrated into OpenMetadata?

wobu avatar Aug 27 '24 06:08 wobu

Hey @wobu we would recommend you to directly contribute the connector to the community. This will allow you to leverage the existing code base to implement support for Usage, Profiling, etc.

Here is a link with more information -> https://docs.open-metadata.org/latest/developers/contribute/developing-a-new-connector

TeddyCr avatar Sep 02 '24 06:09 TeddyCr

We thought about it, and also tried it, but unfortunatley setting up the openmetadata project under windows OS wasn't easy :/ (WSL would maybe an option). So we decided to just start with a custom connector.

The CustomConnector is currently sufficient for us, so we won't provide a direct community integration in the near future until our investment in Openmetadata and Dremio increases.

wobu avatar Sep 04 '24 12:09 wobu

hey @wobu , i hop you are doing well, i have a question about this connector drtemio with dremio , is it support lineage , like collecte lineage from dremio and show it in om , or if you have another solution , thank you .

mohamed-alaoui6 avatar Jun 01 '25 19:06 mohamed-alaoui6

hi @mohamed-alaoui6,

the connector itself doesn't support lineage. We currently use DBT for managing our Views / Models in Dremio. With DBT we then export the lineage to OM.

wobu avatar Jun 04 '25 05:06 wobu

do you think that is possible to use this connector to export data from dremio into om , and then create a script that collect lineage from dremio whith it's api GET /api/v3/catalog/{id}/graph and then transfer that lineage to om via it's api to create relation between entity.

mohamed-alaoui6 avatar Jun 04 '25 13:06 mohamed-alaoui6

the API you mentioned is an Dremio Enterprise Feature only: https://docs.dremio.com/25.x/reference/api/catalog/lineage#lineage-attributes

nevertheless, the main problem would be extending the "Custom" connector to allow Lineage import. I don't know if this is possible by openmetadata itself, because AFAIK i only found custom connector examples which could only import metadata (and no lineage). The alternative would be integrating the connector directly into the repository and the ingestion package of openmetadata.

Other connectors like trino can handle the lineage and they don't need a dedicated API from trino / dremio. I guess the lineage will be extracted over the SQL definition over the Views.

wobu avatar Jun 04 '25 13:06 wobu

I’ve already worked on this kind of integration with Spark. In my implementation, I first connected OpenMetadata with Hive to import the metadata (schemas, tables, etc.).

Then, I developed a custom Python script that:

Initializes a Spark session with the OpenLineage Spark agent enabled.

Executes a Hive-based Spark job (e.g., reading/joining Hive tables and writing results).

Captures the OpenLineage events into a local JSON file.

Uses a custom class (OpenMetadataLineageAgent) to:

    Create/update the pipeline service and pipeline entities in OpenMetadata.

    Trigger metadata ingestion to ensure tables are indexed.

    Retrieve input/output entities from OpenMetadata using FQNs.

    Create lineage edges via the OpenMetadata API by linking source and target tables to the pipeline.

This solution allowed me to capture lineage from Hive/Spark jobs and visualize it in OpenMetadata.

Do you think I could apply a similar approach for Dremio? Specifically:

Import Dremio metadata into OpenMetadata (OM) with that connector

Capture lineage information from Dremio using its API

Represent that lineage as input/output datasets in a json file 

And then send this lineage data to OpenMetadata using a custom ingestion agent

I’m essentially trying to replicate the lineage ingestion logic I've implemented for Spark, but adapted for Dremio. Does this sound feasible

mohamed-alaoui6 avatar Jun 04 '25 14:06 mohamed-alaoui6

there are a lot of ways to do this :)

your approach would work with the requirement of having Dremio Enterprise.

Alternatively you could use the same approach as the openmetadata trino connector is doing it:

  • Query the system table containing all executued queries of dremio: https://docs.dremio.com/current/reference/sql/system-tables/jobs
  • use the existing lineage processing implementations of openmetadata

either way, when you implement it by yourself to have to start the lineage ingestion workflow externaly (because custom connectors won't allow a UI integration for this)

wobu avatar Jun 04 '25 14:06 wobu

Thanks for the explanation, that makes sense. I’ll try it on my side and let you know if I have any updates.

mohamed-alaoui6 avatar Jun 04 '25 14:06 mohamed-alaoui6

Hi @wobu ,

I hope you're doing well.

I'm having an issue setting up the Dremio connector, and I was wondering if you could assist me.

This is the error I'm getting:

"" (venv) mohamed@ubuntu:~/dremio-connector/openmetadata-dremio-connector$ metadata ingest -c ./workflow.dremio.yaml Traceback (most recent call last): File "/home/mohamed/dremio-connector/openmetadata-dremio-connector/venv/bin/metadata", line 5, in from metadata.cmd import metadata File "/home/mohamed/dremio-connector/openmetadata-dremio-connector/venv/lib/python3.12/site-packages/metadata/cmd.py", line 24, in from metadata.cli.app import run_app File "/home/mohamed/dremio-connector/openmetadata-dremio-connector/venv/lib/python3.12/site-packages/metadata/cli/app.py", line 19, in from metadata.config.common import load_config_file File "/home/mohamed/dremio-connector/openmetadata-dremio-connector/venv/lib/python3.12/site-packages/metadata/config/common.py", line 24, in from metadata.ingestion.models.custom_pydantic import BaseModel ModuleNotFoundError: No module named 'metadata.ingestion.models' ""

These are the steps I followed:

"" python3.12 -m venv /home/mohamed/dremio-connector/openmetadata-dremio-connector/venv source venv/bin/activate pip install -e .

Configured workflow.dremio.yaml

metadata ingest -c ./workflow.dremio.yaml ""

Let me know if you have any suggestions or if I'm missing something.

Thanks in advance!

mohamed-alaoui6 avatar Jun 26 '25 13:06 mohamed-alaoui6

@mohamed-alaoui6

try pip install . instead of pip install -e . The editable mode somehow overrides the "metadata" module of openmetadata itself. Unfortunately I am no expert in this. So this is the only solution i've got

wobu avatar Jun 27 '25 07:06 wobu

Thanks a lot, Mr. @wobu, it’s working!

mohamed-alaoui6 avatar Jun 27 '25 18:06 mohamed-alaoui6