Connection for Dremio Lakehouse
Please Describe The Problem To Be Solved
The Problem: Other than Snowflake, there is a lack of connectors to other lakehouse solutions. While a Databricks connector would be nice for many corporate production runs, in the interest of open source, a Dremio connector might be more appreciated by the community. This request is to build a Dremio connector for Quary.
Optional: Suggest A Solution
Looking into the code architecture, it seems that the bulk of connectors are maintained within rust/quary-databases/src/databases_<flavor>.rs and rust/core/src/database_<flavor>.rs. Inspection shows a common class interface already designed across both. For Dremio, there are a number of protocols available including REST, JDBC, & ODBC. However, with a RUST build, it may be advantageous to use the ARROW Flight protocol as Dremio highly support it - can lead to 20X speed-up over JDBC. *In fact, could even extend this issue to a generic "Arrow Flight Connector" type.
A possible plan includes:
- Review full connection interface by building "skeleton" version of
rust/quary-databases/src/databases_dremio.rs. - Review Dremio docs, although may not be needed if just functional Arrow SQL.
- Get feedback on any other requirements to implement a Dremio connector - thought I saw something else in SQL interfacing code.
- Design for any other feedback (e.g. any other needed
*_dremio.rsfiles). - Unit test.
- Review & release into the wild.
Happy to help on this to build out my Rust expertise...
Thanks for this! We'll have a quick look into this today!
Here's a first draft https://github.com/quarylabs/quary/pull/446, it doesn't work and still needs filling in quite a bit but I think it should give you the general structure.
There are a few things to add:
- Proper integration tests could be done with
dremio/dremio-oss - I don't know dremio but there are quite a few auth methods, so making sure we cover the ones I think will be ok
- There seems to be a distinction in flight of read queries and write queries, which is a difference that may make things a little more complicated.
I mostly did it out of curiosity for:
- Dremio which is cool!
- ArrowFlight: We have some translation layers and I am wondering whether quary's internal format should just be arrow.
Love the idea of internal format of Arrow - it looks very sweet!
I'll pull the branch and try to get some feedback by Wed
There's a first draft with this it in being pushed at the moment, it works with username/password/nossl
let host = env::var("DREMIO_HOST")
.map_err(|_| "DREMIO_HOST must be set to connect to Dremio".to_string())?;
let port = env::var("DREMIO_PORT")
.map_err(|_| "DREMIO_PORT must be set to connect to Dremio".to_string())?;
let use_ssl = env::var("DREMIO_USE_SSL")
.map_err(|_| "DREMIO_USE_SSL must be set to connect to Dremio".to_string())?;
let username = env::var("DREMIO_USER")
.map_err(|_| "DREMIO_USER must be set to connect to Dremio".to_string())?;
let password = env::var("DREMIO_PASSWORD")
.map_err(|_| "DREMIO_PASSWORD must be set to connect to Dremio".to_string())?;
let auth = if let Ok(personal_access_token) = env::var("DREMIO_PERSONAL_ACCESS_TOKEN") {
DremioAuth::UsernamePersonalAccessToken(username, personal_access_token)
} else {
DremioAuth::UsernamePassword(username, password)
};
let database = crate::databases_dremio::Dremio::new(
config,
auth,
use_ssl.parse().unwrap(),
host,
port,
)
.await?;
Ok(Box::new(database))
Outlines the variables you need: DREMIO_HOST, DREMIO_PORT, DREMIO_USE_SSL, DREMIO_USER, DREMIO_PASSWORD and they can be stored in .env file
DREMIO_HOST=localhost
DREMIO_PORT=32010
DREMIO_USE_SSL=false
DREMIO_USER=admin
DREMIO_PASSWORD=fht4jyx9HAY!jxk1ydg
Is what I got working locally for this "setup" // It should be running on the following ports // docker run -p 9047:9047 -p 31010:31010 -p 32010:32010 -p 45678:45678 dremio/dremio-oss // 1. Create test space // 2. Create test folder inside the test space // 3. Create the samples source
With the config
dremio:
dremio_space: test
dremio_space_folder: test