OpenMetadata
OpenMetadata copied to clipboard
support for duckdb
https://openmetadata.slack.com/archives/C02B6955S4S/p1697099341468359
https://github.com/open-metadata/OpenMetadata/blob/f63881b8b6f78b39a4a014eb3f67df62ce170780/ingestion/src/metadata/ingestion/lineage/models.py#L75 is listing duckdb
DBT supports duckdb
but for OM to ingest dbts nodes somehow the duckdb tables would need to be loaded beforehand
duckdb is a supported dialect for the lineage engine, but we do not have a connector yet.
It should be good contribution from our community by watching the tutorial or reviewing similar PRs
Thanks!
I see this issue is still open. I am interested in this project and contributing to it. Please assign a good first issue to me to work on. Thank you!
hi @saurabhyadav1985, assigned, thanks
Hey @saurabhyadav1985 , You Forgot To Add DuckDB.md File In openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database . Please Check It.
hi @saurabhyadav1985 I see that the duckdb connection is a copy of the greenplum connection https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/duckdbConnection.json
What properties do we actually need to connect there?
Actually, looks like almost all the greenplum code has been copy-pasted into DuckDB. Not sure that's the best approach here, since DuckDB might have its own types, connection specifications etc. This might need another iteration to get the right information.
I think it's worth it to revert the change and put some time to review the requirements of the connector, rather than shipping it as-is and having troubles ingesting and migrating data after this gets updated
@pmbrull what is the state of this? What exactly is missing? do you have some clear instructions? Maybe I find time to contribute
@geoHeil the past contribution did not solve the actual problem at hand, so it was reverted to avoid any confusion.
If you'd like to contribute, you can follow this guide https://docs.open-metadata.org/v1.3.x/developers/contribute/developing-a-new-connector
Thanks
This is really generic - if we want to reuse a DB connector let`s say postgres as a template for duckdb - can we speed up the process? I.e. is it enough to perhaps create the ingest connector but for the data model keep whatever postgres is offering (as that should be the same on the OM server side)
This is really generic - if we want to reuse a DB connector let`s say postgres as a template for duckdb - can we speed up the process? I.e. is it enough to perhaps create the ingest connector but for the data model keep whatever postgres is offering (as that should be the same on the OM server side)
You can take other PRs as examples. I shared one above. But in the end, type mapping, sqlalchemy etc. needs to be dependant on each connector. The overall framework is already designed to force you to touch as few things as possible.
I have created some preliminary DDB support - however outside of OMs standard ingestion framework - simply manually calling the API - would anyone be interested in re-using this?
I have created some preliminary DDB support - however outside of OMs standard ingestion framework - simply manually calling the API - would anyone be interested in re-using this?
Hi @geoHeil ! Whats your state at this? I also would contribute or even build a basic custom connector as we definitely will need this.
I am using a home-grown connector it does not have all the features/bells & whistles - but if desired I coud share it somehow
I am not sure if I would have the time to make a full blown integration here - it is basically a monkey pateched version of the python integration where I use some python code, compute the metadata which is needed and then push it to the OM API
for a proper integration (i.e. showing the icon of duckdb instead of here pg) someone would have to do more.