metabase-athena-driver icon indicating copy to clipboard operation
metabase-athena-driver copied to clipboard

Connection to multiple catalogs not supported

Open jainanuj07 opened this issue 4 years ago • 8 comments

I have made a connection using the driver in metabase but it showing me all the database and tables in default AwsDataCatalog.

I have created other catalogs also in athena using hive metastore but that is not visible in metabase. How can i specify catalog name that i want to connect.

jainanuj07 avatar Jul 29 '20 08:07 jainanuj07

Multiple catalogs weren't a thing when I first wrote the driver, so I think I have to dig into how multiple catalogs/schemas are supported in Metabase and see how I could do the same here. It should be feasible, but this will remain on the backlog for now as you can write native SQL queries to access other catalogs as a workaround.

dacort avatar Jul 30 '20 06:07 dacort

Hi, how are you? Do you have any predictions, or any tips on how to make this improvement?

guilhermetavares avatar May 25 '21 21:05 guilhermetavares

I looked into this a little bit more today. As noted in this comment, I previously tried using the Catalog and MetadataRetrievalMethod properties to see if it would work, but didn't have success.

I've learned a bit more since then:

  • MetadataRetrievalMethod will return the extra catalogs, but the existing code can't handle them.
  • In order to support federated queries, I had to add a new EXTERNAL table type to the get-tables method
  • Catalog setting could help but I believe it only sets the default catalog for actual queries, not database/table metadata

In order to make this feasible, a rewrite to the entire set of table metadata methods is required starting from driver/describe-database. A few notes on that:

  • fast-active-tables will list all the catalogs, but when passing the schema/database names to get-tables, it ignores the catalog name that is returned by the API.
  • fast-active-tables can take a catalog name, but we currently just pass in "nil"
  • Even if we did pass in a catalog name, say via a connection parameter, describe-table still needs to be rewritten to handle that.

I was able to manage to get it to "mostly" work (DB sync worked, tables and columns showed up, and I could query the table) by hardcoding in a catalog name in driver/describe-database and driver/describe-table, so that's promising.

That leaves a couple options:

  • Add a connection parameter to specify a different catalog then the default
    • Low effort, but I need to figure out how to get the catalog name to the describe-table method
  • "Truly" support multiple catalogs
    • high effort, full rewrite

That said, I certainly don't have the bandwidth for the second option. I'll look into the first a little bit more to see if it's feasible. It would be nice to be able to at least query different catalogs.

dacort avatar May 20 '22 22:05 dacort

I've looked into adding a connection parameter for a different catalog and have been able to make it work! 🙌

If you're interested in testing it out, leave a 👍 reaction here and I'll make a test release. Otherwise, I'll probably wrap it up into the next big release.

image

I am actually surprisingly close to making multiple catalogs work as well, but still a bit harder. :)

dacort avatar May 28 '22 21:05 dacort

Hi @dacort. This plugin is pretty awesome. I'll be happy to help you by testing multiple catalogs. Where can I find the testing release?

pmoralesp avatar Sep 22 '22 03:09 pmoralesp

Hi @pmoralesp - after chatting with the MB team, I don't think multiple catalogs is an option (due to no way to show it in the UI). But I could potentially wrap up the option to specify a non-default catalog and upload a release. Let me know if that's still of interest.

dacort avatar Sep 22 '22 16:09 dacort

That would be very usefull!!. I tried with Catalog=<non_default_catalog> as "Additional Athena connection string options", but doesn't worked.

pmoralesp avatar Sep 22 '22 17:09 pmoralesp

Yea, there was a lot of extra work in the core of the driver I had to change to support a different catalog. I'll see if I can clean it up and get something out by next week.

dacort avatar Sep 22 '22 21:09 dacort

@pmoralesp I've built an image that can be used to test this out. You can run it with Docker with the following command:

docker run -p 3000:3000 ghcr.io/dacort/metabase-athena-driver:sha-4814b40

When you go to setup Athena, there should be a new "Catalog" option. Just enter the name of your additional catalog (or as it's shown in the Athena console, data source).

image

In addition, I've moved the access key and secret credentials behind a toggle box as in a production environments, folks should be using Instance profile credentials.

Feel free to give it a shot and post here if you have any issues.

dacort avatar Oct 03 '22 17:10 dacort

It works like a charm! First try and I be able to retrive ~60 tables from DynamoDB through Athena. Thanks @dacort

pmoralesp avatar Oct 03 '22 18:10 pmoralesp

Specific catalogs can now be selected as part of the v1.5.0 release. Also closing as part of cleanup now that Athena is officially supported by Metabase.

dacort avatar Dec 08 '22 18:12 dacort