Athena Data Source: support Catalog
PyAthena added support for Catalogs in https://github.com/laughingman7743/PyAthena/issues/220 and it'd be great if redash also supported them.
What I think needs to be done:
- [ ] Update
athena.pyto includecatalog_namehttps://github.com/getredash/redash/blob/f21f7e211ff5fe12c850a9627cfadb80a5a8819e/redash/query_runner/athena.py#L221-L229 - [ ] Update UI to include a new text field to enter a catalog name
- [ ] Persist the new field in DB
- Redash version: 10.0
I agree this looks like a good addition. Since we don't use Athena on the core team we need someone in the community to write and verify the changes. But we've updated our documentation about writing a query runner which should make this straightforward. In summary:
- Add
catalog_nameto the configuration schema. This will automatically add it to the data source connection screen and persist it to the database. - Update the
pyathena.connect()call to use the value ofself.configuration["catalog_name"]. - Establish a default behaviour so that existing Athena data sources won't break since a
catalog_nameis not specified.
If you'd like to give it a try I'm happy to help with any questions. And look forward to reviewing and merging this soon.
Is anyone working on this issue? Happy to take a shot
Awesome @yongchand thank you! Nobody else has picked this up. Feel free to ping me with any questions along the way.
@susodapop Just out of curiosity, https://github.com/getredash/redash/pull/5741#issue-1213704982 isnt this PR related?
🤦 You're correct.
If that PR merges will it meet your need? Can you check out the change and try it (cautiously)?
@susodapop Sure. I can run on docker and see if it works. But where should we insert this extra_options?
You insert the options on the data source setup screen in settings. That pull request modifies the query runner configuration_schema, which controls what options are displayed there.
@susodapop sorry again, is there any instructions for testing codes using docker? (Not using pre-existing image)
Yes, you can run our Docker development devloop: https://redash.io/help/open-source/dev-guide/docker
The only difference is here you will need to checkout that pull request's code, instead of master.
This is easiest if you install the GitHub CLI (gh) on your machine. Then right after the setup step where you clone redash you will run:
gh pr checkout 5741
Which will pull that feature branch onto your machine. Then when you run docker-compose up -d it will spin up the containers using your local code, and any changes you make will be visible when you browse to localhost:5000.
@susodapop sorry for keep bugging you. Seems like I have done a successful job to run redash on local environment via docker. However, I cant see athena in datasource, even though i can see numerous other data sources like prometheus or etc. Is this behavior expected?
No worries, this is what I'm here for :) I'm happy to jump on a call with you if that will help. Just ping me [email protected] and I'll send you a meeting invitation.
The athena data source is enabled by default. It will not appear in the data source list in the following cases:
- It's not included in the default data source list in
redash/settings/__init__.py - One of its Python dependencies could not be loaded
- It's specifically excluded with the
REDASH_DISABLED_QUERY_RUNNERSenvironment variable.
Also FWIW, I can load these changes when I check out that PR branch, so it's likely something failed during your docker-compose up -d command.
@susodapop Does running on M1 Mac may affect the issue? I will take a one last shot and if it fails we can arrange a meeting
oh yes that will certainly affect things (but not for long). The build step on M1 macs currently fails on master, which would cause exactly the outcome you see.
I've made a pull request that fixes it: https://github.com/getredash/redash/pull/5788
Since it hasn't merged to master yet, you can do the following to apply the fix at the same time as the Athena upgrade pull request.
# starting from an empty folder
git clone https://github.com/getredash/redash.git
cd redash/
gh pr checkout 5741
curl https://patch-diff.githubusercontent.com/raw/getredash/redash/pull/5788.patch > 0001-m1-fix.patch
git am 0001-m1-fix.patch
This will apply the M1 build fix as an extra commit. Then you just run docker-compose build to rebuild your containers. When that finishes docker-compose up -d will work :)
@susodapop I just tested and can confirm that it works! I was successful to connect two different catalog in my Athena. Can you merge PR and distribute new docker image if possible?
is this still ongoing?
Has this been fixed?