Dremio
Overview
Adds Dremio support.
Update type - breaking / non-breaking
- [ ] Minor bug fix
- [ ] Documentation improvements
- [ ] Quality of Life improvements
- [x] New features (non-breaking change)
- [ ] New features (breaking change)
- [ ] Other (non-breaking change)
- [ ] Other (breaking change)
What does this solve?
Closes #366 .
Outstanding questions
The main outstanding issue is setting up a Dremio cluster for the CI pipeline integration tests. So far I've been testing this locally against our on-prem cluster. I've managed to get it building successfully against the integration test project and our in-house project.
There is potential for refactoring the upload macros as there is quite a lot of code duplication. I refrained from implementing this refactoring to keep this PR focused on just adding Dremio support, but it would make sense to handle that in a separate PR.
What databases have you tested with?
- [ ] Snowflake
- [ ] Google BigQuery
- [ ] Databricks
- [ ] Spark
- [x] Dremio
- [ ] N/A
Hi @maxfirman . Thanks for taking the time to add this functionality in.
One of the team will spend some time reviewing it and get back to you.
Hi @glsdown, thanks for taking the time to review my PR.
I can see that my changes caused a regression for upload_sources and dim_dbt__snapshots. I've just pushed a commit that will hopefully resolve those failures.
Hi @maxfirman . Thanks for your work on this. As an update to the progress, I've just pushed some changes which will updates some of the script with new changes that have been recently introduced. If you could test they still work with your instances, that would be great.
My current sticking point though is being able to test it all. I'm just working to get Dremio set up locally, but I don't have any familiarity with it, so whilst I am able to get it running on Docker, it's taking me some time to get dbt working with it. If you have any setup that you could recommend that would be great.
We don't want to merge any code that doesn't have testing in place, so that's why this is taking a bit of time to get through.
@glsdown thanks very for taking a look at this. I appreciate its a reasonable amount of work to set up integration testing, and I'm happy to help as much as I can.
I've tested your latest changes. I had to make a small patch, but I can now build the test project locally without errors.
There are a couple of ways to go in terms of setting up a Dremio instance for testing. The most straightforward approach, assuming you have access to an AWS account, would be to spin up a Dremio Cloud account and link it to your AWS account. You would then need to connect to a metastore in order to be able to create Apache Iceberg tables. Your two options would be either AWS Glue Catalog or Dremio's Arctic metastore. I don't have much experience with the later, so would probably suggest going with AWS Glue catalog. In either case you will also need to spin up an s3 bucket to store the actual data, and configure to be the hive.metastore.warehouse.dir.
The alternative approach would be to go totally DIY and use the standalone dremio-oss Docker image. The problem with this is that you would also need to spin up a Minio object store and a Hive Metastore in order to be able to create Iceberg tables. Getting everything configured and talking to each other which would probably require more effort compared to Dremio Cloud / AWS approach.