redash icon indicating copy to clipboard operation
redash copied to clipboard

Athena Data Source: support Catalog

Open VasilyFomin opened this issue 2 years ago • 17 comments

PyAthena added support for Catalogs in https://github.com/laughingman7743/PyAthena/issues/220 and it'd be great if redash also supported them.

What I think needs to be done:

  • [ ] Update athena.py to include catalog_name https://github.com/getredash/redash/blob/f21f7e211ff5fe12c850a9627cfadb80a5a8819e/redash/query_runner/athena.py#L221-L229
  • [ ] Update UI to include a new text field to enter a catalog name
  • [ ] Persist the new field in DB
  • Redash version: 10.0

VasilyFomin avatar Jan 10 '22 16:01 VasilyFomin

I agree this looks like a good addition. Since we don't use Athena on the core team we need someone in the community to write and verify the changes. But we've updated our documentation about writing a query runner which should make this straightforward. In summary:

  1. Add catalog_name to the configuration schema. This will automatically add it to the data source connection screen and persist it to the database.
  2. Update the pyathena.connect() call to use the value of self.configuration["catalog_name"].
  3. Establish a default behaviour so that existing Athena data sources won't break since a catalog_name is not specified.

If you'd like to give it a try I'm happy to help with any questions. And look forward to reviewing and merging this soon.

susodapop avatar Mar 12 '22 12:03 susodapop

Is anyone working on this issue? Happy to take a shot

yongchand avatar Jul 19 '22 17:07 yongchand

Awesome @yongchand thank you! Nobody else has picked this up. Feel free to ping me with any questions along the way.

susodapop avatar Jul 19 '22 17:07 susodapop

@susodapop Just out of curiosity, https://github.com/getredash/redash/pull/5741#issue-1213704982 isnt this PR related?

yongchand avatar Jul 20 '22 14:07 yongchand

🤦 You're correct.

If that PR merges will it meet your need? Can you check out the change and try it (cautiously)?

susodapop avatar Jul 20 '22 14:07 susodapop

@susodapop Sure. I can run on docker and see if it works. But where should we insert this extra_options?

yongchand avatar Jul 20 '22 14:07 yongchand

You insert the options on the data source setup screen in settings. That pull request modifies the query runner configuration_schema, which controls what options are displayed there.

susodapop avatar Jul 20 '22 14:07 susodapop

@susodapop sorry again, is there any instructions for testing codes using docker? (Not using pre-existing image)

yongchand avatar Jul 20 '22 16:07 yongchand

Yes, you can run our Docker development devloop: https://redash.io/help/open-source/dev-guide/docker

The only difference is here you will need to checkout that pull request's code, instead of master.

This is easiest if you install the GitHub CLI (gh) on your machine. Then right after the setup step where you clone redash you will run:

gh pr checkout 5741

Which will pull that feature branch onto your machine. Then when you run docker-compose up -d it will spin up the containers using your local code, and any changes you make will be visible when you browse to localhost:5000.

susodapop avatar Jul 20 '22 17:07 susodapop

@susodapop sorry for keep bugging you. Seems like I have done a successful job to run redash on local environment via docker. However, I cant see athena in datasource, even though i can see numerous other data sources like prometheus or etc. Is this behavior expected?

yongchand avatar Jul 20 '22 17:07 yongchand

No worries, this is what I'm here for :) I'm happy to jump on a call with you if that will help. Just ping me [email protected] and I'll send you a meeting invitation.

The athena data source is enabled by default. It will not appear in the data source list in the following cases:

  • It's not included in the default data source list in redash/settings/__init__.py
  • One of its Python dependencies could not be loaded
  • It's specifically excluded with the REDASH_DISABLED_QUERY_RUNNERS environment variable.

susodapop avatar Jul 20 '22 17:07 susodapop

Also FWIW, I can load these changes when I check out that PR branch, so it's likely something failed during your docker-compose up -d command.

CleanShot 2022-07-20 at 12 34 23@2x

susodapop avatar Jul 20 '22 17:07 susodapop

@susodapop Does running on M1 Mac may affect the issue? I will take a one last shot and if it fails we can arrange a meeting

yongchand avatar Jul 20 '22 17:07 yongchand

oh yes that will certainly affect things (but not for long). The build step on M1 macs currently fails on master, which would cause exactly the outcome you see.

I've made a pull request that fixes it: https://github.com/getredash/redash/pull/5788

Since it hasn't merged to master yet, you can do the following to apply the fix at the same time as the Athena upgrade pull request.

# starting from an empty folder
git clone https://github.com/getredash/redash.git
cd redash/
gh pr checkout 5741
curl https://patch-diff.githubusercontent.com/raw/getredash/redash/pull/5788.patch > 0001-m1-fix.patch
git am 0001-m1-fix.patch

This will apply the M1 build fix as an extra commit. Then you just run docker-compose build to rebuild your containers. When that finishes docker-compose up -d will work :)

susodapop avatar Jul 20 '22 18:07 susodapop

@susodapop I just tested and can confirm that it works! I was successful to connect two different catalog in my Athena. Can you merge PR and distribute new docker image if possible?

yongchand avatar Jul 21 '22 01:07 yongchand

is this still ongoing?

mirkan1 avatar May 03 '23 01:05 mirkan1

Has this been fixed?

gss2002 avatar May 01 '24 00:05 gss2002