feat(oauth2): add support for trino
SUMMARY
Under https://github.com/apache/superset/pull/27631 under https://github.com/apache/superset/issues/20300 It also fixes an issue not totally resolved here https://github.com/apache/superset/pull/29981, which is required for OAuth2 to work for trino.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
https://github.com/user-attachments/assets/af59ff3f-a38c-4225-a51c-7347f1b42971
TESTING INSTRUCTIONS
- I created a Keycloak client for trino and added this configuration in
superset_docker_config.py:
DATABASE_OAUTH2_REDIRECT_URI = "http://localhost:8088/api/v1/database/oauth2/"
DATABASE_OAUTH2_CLIENTS = {
'Trino': {
'id': 'trino',
'secret': ''<some-secret>',
'scope': 'openid email offline_access roles profile',
'redirect_uri': 'http://localhost:8088/api/v1/database/oauth2/',
'authorization_request_uri': 'https://<the url of keycloak deploy>/realms/master/protocol/openid-connect/auth',
'token_request_uri': 'https://<the url of keycloak deploy>/realms/master/protocol/openid-connect/token',
'request_content_type': 'data' # keycloak doesn't accept application/json body.
}
}
- Database configured via UI: with following settings:
trino://<trino_url>:443/tpcds
{"connect_args":{"http_scheme":"https"}}
Impersonate: true
ADDITIONAL INFORMATION
- [x] Has associated issue: https://github.com/apache/superset/issues/20300
- [ ] Required feature flags:
- [ ] Changes UI
- [ ] Includes DB Migration (follow approval process in SIP-59)
- [ ] Migration is atomic, supports rollback & is backwards-compatible
- [ ] Confirm DB migration upgrade and downgrade tested
- [ ] Runtime estimates and downtime expectations provided
- [ ] Introduces new feature or API
- [ ] Removes existing feature or API
Need feedback with:
We still need to trigger this OAuth2 dance in, at least, 2 contexts (I don't know much about superset, possibility there are more):
- Automatic attempt to list schemas and tables
- Testing Connection when adding the database: temp: previous OAuth2 features implemented and already merged don't include a way to trigger this flow when adding a connection via UI. For this reason, I had to hack the test_connection.py so I'm allowed to
Codecov Report
:x: Patch coverage is 79.31034% with 6 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 83.91%. Comparing base (76d897e) to head (4011e69).
:warning: Report is 2203 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| superset/db_engine_specs/trino.py | 70.58% | 5 Missing :warning: |
| superset/db_engine_specs/base.py | 88.88% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #30081 +/- ##
===========================================
+ Coverage 60.48% 83.91% +23.42%
===========================================
Files 1931 534 -1397
Lines 76236 38788 -37448
Branches 8568 0 -8568
===========================================
- Hits 46114 32549 -13565
+ Misses 28017 6239 -21778
+ Partials 2105 0 -2105
| Flag | Coverage Δ | |
|---|---|---|
| hive | 48.91% <48.27%> (-0.24%) |
:arrow_down: |
| javascript | ? |
|
| mysql | 76.73% <51.72%> (?) |
|
| postgres | 76.85% <51.72%> (?) |
|
| presto | 53.41% <48.27%> (-0.40%) |
:arrow_down: |
| python | 83.91% <79.31%> (+20.40%) |
:arrow_up: |
| sqlite | 76.31% <51.72%> (?) |
|
| unit | 60.90% <79.31%> (+3.26%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Looks like @betodealmeida was working on https://github.com/apache/superset/pull/30126 at the same time which might be related -- Beto, are you able to review this one?
tagging @betodealmeida as he has most context to review this
tried my luck fixing the merge conflict in the GitHub web editor, didn't pull the branch and run CI, so no pre-commit checks.... let's see if I get lucky, if not feel free to disregard my commit
@villebro @betodealmeida looks like we have a few nits and unresolved questions here, but the thread has gone quiet. If @joaoferrao doesn't respond, would anyone want to block this, or should someone smash that merge button?
(note that code suggestions can be committed here by the person offering them, btw!)
Sorry, I will take a look at the comments over the coming days. I’ve been rather busy so I haven’t had time to refocus attention again.
@betodealmeida @villebro I have addressed the main comments, I believe.
One question I still made when opening my PR is still open:
Testing Connection when adding the database: temp: previous OAuth2 features implemented and already merged don't include a way to trigger this flow when adding a connection via UI. For this reason, I had to hack the test_connection.py so I'm allowed to
I don't know enough about superset, but should this be addressed in a separate PR? Basically, I had to do a lot of hacking to be able to add the connection without the test connection working in the UI - ofc so I could develop the functionality and actually prove the connection it self works.
I’ve been working on setting up OAuth2.0 integration for Trino with Okta in Superset, following the guidance from this PR (#30081). However, despite configuring authorization_request_uri, token_request_uri, and handling impersonation, I’m still encountering issues with 401 Unauthorized errors when trying to connect to Trino. My current setup looks like this:
trino_oauth
DATABASE_OAUTH2_REDIRECT_URI: "http://localhost:8088/api/v1/database/oauth2/" DATABASE_OAUTH2_CLIENTS: { 'HSH Trino Oauth': { 'id': '<trino_app_client_id>', 'secret': '<trino_app_client_secret>', 'scope': 'openid email offline_access roles profile', 'redirect_uri': 'http://localhost:8088/api/v1/database/oauth2/', 'authorization_request_uri': 'https://example.okta.com/oauth2/v1/authorize', 'token_request_uri': 'https://example.okta.com/oauth2/v1/token' } }
{"connect_args":{"http_scheme":"https"}}
Impersonate: true is in place as well and UI shows the correct configs.
I've reviewed the setup and ensured that the OAuth tokens are obtained correctly from Okta, and all necessary permissions seem in place. Are there any additional configurations or specifics for Trino that this PR setup might require? Any guidance or insights on further troubleshooting would be greatly appreciated!
Thank you!
I've reviewed the setup and ensured that the OAuth tokens are obtained correctly from Okta, and all necessary permissions seem in place. Are there any additional configurations or specifics for Trino that this PR setup might require? Any guidance or insights on further troubleshooting would be greatly appreciated!
A bit more detail would help fine tune the issue. Can I ask if you, even if redundantly:
- Have you confirmed trino is working with OAuth2 without superset? Meaning: trino cli with
trino --server='<your trino endpoint uri>' --external-authentication=true? - Does that work with the exact same user your are trying to impersonate from superset?
- Have you configured superset to use the same "username" as you are using for trino?
If you are sure you are getting the access token back from the IdP (after being requested to authenticate), then this is what is sent to trino:
if backend_name == "trino" and username is not None:
connect_args["user"] = username
if access_token is not None:
http_session = requests.Session()
http_session.headers.update({"Authorization": f"Bearer {access_token}"})
connect_args["http_session"] = http_session
@joaoferrao the first part works. I verified that. I am unclear about the username part. Can you shed some more light on that? Have you configured superset to use the same "username" as you are using for trino? how do i verify this?
I don't know enough about superset, but should this be addressed in a separate PR? Basically, I had to do a lot of hacking to be able to add the connection without the
test connectionworking in the UI - ofc so I could develop the functionality and actually prove the connection it self works.
Yeah, this was a chicken-and-egg problem. The token is stored associated with the database ID, so you can't do OAuth2 before the database is created. But for the database to be created, we require a successful connection, which in this case is impossible.
I added some logic in https://github.com/apache/superset/pull/30071 for Snowflake so that if the test connection fails because OAuth2 is needed then the test connection command succeeds:
https://github.com/apache/superset/pull/30071/files#diff-627a6d549d121af6805862d2edcf1fbbbc68c6df72013a81e67cc8f771b85717R166-R171
Wasn't this logic enough for the same flow with Trino?
@betodealmeida that is indeed very similar to what I did for me to test the feature, just that I didn't publish it, because I didn't delve enough to the test_connection and it would impact more than just trino.
Thanks for the feedback. I see all checks are passed and my PR is still approved. Do I assume correctly I should merge it?
@joaoferrao the first part works. I verified that. I am unclear about the username part. Can you shed some more light on that? Have you configured superset to use the same "username" as you are using for trino? how do i verify this?
Take what I'm saying with a grain of salt, but the point of impersonation is for your superset user to be impersonated upon the request to trino. Whatever the username is for superset, is going to be used in the --user argument of the trino connection. Currently, this username, depending on your Superset settings will either your Superset username or everything in your email address up until the @.
@joaoferrao I did verify the command trino --server='
and that works on the browser but throws cannot impersonate user error in the end on the cluster
Does it or doesn't in work when you remove superset from the equation? The previous comment left me a bit confused about the scenario you described.
- This should work with standalone trino + okta:
trino --server='<your trino endpoint uri>' --external-authentication=true --user=<username_used_by_superset>(notice the last argument) + execute some query such asSHOW CATLAOGS;to confirm you have permissions to see the catalogs. This username must be the same that is injected by superset in the query. If it doesn't work, you may have a problem with Trino setup. adding 'request_content_type': 'application/x-www-form-urlencoded': this is whatdatadoes. This is used for the request from superset to okta to retrieve the tokens. This doesn't affect impersonation directly.- You mentioned in your initial comment that "I've reviewed the setup and ensured that the OAuth tokens are obtained correctly from Okta": this would mean that superset is correctly retrieving the token from okta, so there should be no issue with the
datatype of the request.
@joaoferrao thanks for all the pointers. Noticed that trino is using the email and superset uses okta token as the username hence the mismatch and 401. I will need to change the setup on either trino or superset side to get the authentication correct.
Thanks for working on this, @joaoferrao!
Testing Connection when adding the database: temp: previous OAuth2 features implemented and already merged don't include a way to trigger this flow when adding a connection via UI. For this reason, I had to hack the test_connection.py so I'm allowed to
Hello @joaoferrao, do you have a solution to create the connection to Trino with oauth 2 enabled in a production environment ?
@joaoferrao @betodealmeida greetings!
I asked sometime ago some questions about setting up OAuth2.0 integration for Trino with Okta in Superset, following the guidance from this PR (https://github.com/apache/superset/pull/30081). We have got past the username part and now superset recognizes email as username. I am having trouble now setting up the database connection. It keeps throwing unauthorized error. My current setup looks like this:
trino_oauth DATABASE_OAUTH2_REDIRECT_URI = "http://localhost:8088/api/v1/database/oauth2/" DATABASE_OAUTH2_JWT_ALGORITHM = "RS256" DATABASE_OAUTH2_CLIENTS = { 'HSH Trino Oauth': { 'id': '<trino_app_client_id>', 'secret': '<trino_app_client_secret>', 'scope': 'openid email offline_access roles profile', 'redirect_uri': 'http://localhost:8088/api/v1/database/oauth2/', 'authorization_request_uri': 'https://example.okta.com/oauth2/v1/authorize', 'token_request_uri': 'https://example.okta.com/oauth2/v1/token' 'request_content_type': 'json' } }
{"connect_args":{"http_scheme":"https"}}
Impersonate: true is in place as well and UI shows the correct configs. Any guidance or insights on further troubleshooting would be greatly appreciated!
Thank you!