dbt-clickhouse icon indicating copy to clipboard operation
dbt-clickhouse copied to clipboard

Support chDB as a driver

Open ThomAub opened this issue 1 year ago • 5 comments

Summary

This is a first proposition for #297. It's currently working for our use case of embedded ClickHouse for unit tests with DBT.

It would be great to have some feedback in the direction of this feature implementation ! We would also be interested in how should we add Unit and integration tests or even a tutorial ?

Checklist

Delete items not relevant to your PR:

  • [ ] Unit and integration tests covering the common scenarios were added
  • [ ] A human-readable description of the changes was provided to include in CHANGELOG
  • [ ] For significant changes, documentation in https://github.com/ClickHouse/clickhouse-docs was updated with further explanations or tutorials

ThomAub avatar Oct 18 '24 09:10 ThomAub

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Oct 18 '24 09:10 CLAassistant

also tagging @auxten if you have insight on the chDB part

ThomAub avatar Oct 18 '24 10:10 ThomAub

It looks great. I'm here for any issue or question :)

auxten avatar Oct 21 '24 06:10 auxten

@ThomAub There are some lint issues to fix, and we also need unit and integration tests.

You can "Allow edits from maintainers."

auxten avatar Oct 28 '24 11:10 auxten

Hi folks,

Thank you for your contribution!

Before reviewing your PR, please add the following:

  • A short description with a link to this PR in the changelog (please keep the current format we have in the Changelog file).
  • As this feature isn't minor, please add tests to cover the added functionality and documentation to the readme file.

Looking forward to reviewing this!

BentsiLeviav avatar Oct 30 '24 10:10 BentsiLeviav

@ThomAub In the chdb v2.2.0b0, the chdb.dbapi is totally refactored. The API and behavior suppose to be not changed. The good parts are:

  • Stateful Query with long live clickhouse engine instance bind with connection
  • Python DB-API reimplemented with new API. Both persist and memory engine supported
  • ClickHouse memory engine support
  • Some performance improvement (~43%)

auxten avatar Nov 26 '24 06:11 auxten

I am eagerly waiting for this so that I can use chDB instead of duckdb for doing local transformations (WIP). Our warehouse is clickhouse, so it would make a lot of sense for us to have a compatible SQL for the local processing as well.

@ThomAub Thanks for the great work. When do you think it would be generally available and maybe with some helpful doc or blog post?

arun11299 avatar Dec 04 '24 15:12 arun11299

We are actively testing this in house so we will soon make it a proper PR with documentation before end of year

ThomAub avatar Dec 04 '24 16:12 ThomAub

@ThomAub Can you please update us on the PR state ? Is there any more work required to get this merged ?

arun11299 avatar Jan 29 '25 07:01 arun11299

Hello @arun11299 We are facing some issues with using CHDB for testing purposes. We have a Clickhouse cluster, and CHDB is not working well for distributed or replicated tables. Do you use Dictionaries or Replicated tables ?

ThomAub avatar Jan 30 '25 09:01 ThomAub

@ThomAub Can you describe in detail the problem you have? And which version of chDB you are using, for the scenario of dbt I really recommend chDB v3.0.0

auxten avatar Jan 30 '25 10:01 auxten

@ThomAub Maybe a dumb question, but why ChDb needs to worry about replicated merge tables ? Isn't it just embedded and hence single node?

Anyways, in my use case, I want to use ChDb for doing data pipeline implementation and testing in dev environment which is replaced by clickhouse cloud in production.

arunmu-nx avatar Feb 01 '25 17:02 arunmu-nx

@ThomAub how's your testing getting on? curious to know if you've found/noticed anything?

laeg avatar May 27 '25 10:05 laeg