dbt-clickhouse
dbt-clickhouse copied to clipboard
Support chDB as a driver
Summary
This is a first proposition for #297. It's currently working for our use case of embedded ClickHouse for unit tests with DBT.
It would be great to have some feedback in the direction of this feature implementation ! We would also be interested in how should we add Unit and integration tests or even a tutorial ?
Checklist
Delete items not relevant to your PR:
- [ ] Unit and integration tests covering the common scenarios were added
- [ ] A human-readable description of the changes was provided to include in CHANGELOG
- [ ] For significant changes, documentation in https://github.com/ClickHouse/clickhouse-docs was updated with further explanations or tutorials
also tagging @auxten if you have insight on the chDB part
It looks great. I'm here for any issue or question :)
@ThomAub There are some lint issues to fix, and we also need unit and integration tests.
You can "Allow edits from maintainers."
Hi folks,
Thank you for your contribution!
Before reviewing your PR, please add the following:
- A short description with a link to this PR in the changelog (please keep the current format we have in the Changelog file).
- As this feature isn't minor, please add tests to cover the added functionality and documentation to the readme file.
Looking forward to reviewing this!
@ThomAub In the chdb v2.2.0b0, the chdb.dbapi is totally refactored. The API and behavior suppose to be not changed. The good parts are:
- Stateful Query with long live clickhouse engine instance bind with connection
- Python DB-API reimplemented with new API. Both persist and memory engine supported
- ClickHouse memory engine support
- Some performance improvement (~43%)
I am eagerly waiting for this so that I can use chDB instead of duckdb for doing local transformations (WIP). Our warehouse is clickhouse, so it would make a lot of sense for us to have a compatible SQL for the local processing as well.
@ThomAub Thanks for the great work. When do you think it would be generally available and maybe with some helpful doc or blog post?
We are actively testing this in house so we will soon make it a proper PR with documentation before end of year
@ThomAub Can you please update us on the PR state ? Is there any more work required to get this merged ?
Hello @arun11299 We are facing some issues with using CHDB for testing purposes. We have a Clickhouse cluster, and CHDB is not working well for distributed or replicated tables. Do you use Dictionaries or Replicated tables ?
@ThomAub Can you describe in detail the problem you have? And which version of chDB you are using, for the scenario of dbt I really recommend chDB v3.0.0
@ThomAub Maybe a dumb question, but why ChDb needs to worry about replicated merge tables ? Isn't it just embedded and hence single node?
Anyways, in my use case, I want to use ChDb for doing data pipeline implementation and testing in dev environment which is replaced by clickhouse cloud in production.
@ThomAub how's your testing getting on? curious to know if you've found/noticed anything?