superset icon indicating copy to clipboard operation
superset copied to clipboard

feat(oauth2): add support for trino

Open joaoferrao opened this issue 1 year ago • 4 comments

SUMMARY

Under https://github.com/apache/superset/pull/27631 under https://github.com/apache/superset/issues/20300 It also fixes an issue not totally resolved here https://github.com/apache/superset/pull/29981, which is required for OAuth2 to work for trino.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

https://github.com/user-attachments/assets/af59ff3f-a38c-4225-a51c-7347f1b42971

TESTING INSTRUCTIONS

  1. I created a Keycloak client for trino and added this configuration in superset_docker_config.py:
DATABASE_OAUTH2_REDIRECT_URI = "http://localhost:8088/api/v1/database/oauth2/"
DATABASE_OAUTH2_CLIENTS = {
    'Trino': {
        'id': 'trino',
        'secret': ''<some-secret>',
        'scope': 'openid email offline_access roles profile',
        'redirect_uri': 'http://localhost:8088/api/v1/database/oauth2/',
        'authorization_request_uri': 'https://<the url of keycloak deploy>/realms/master/protocol/openid-connect/auth',
        'token_request_uri': 'https://<the url of keycloak deploy>/realms/master/protocol/openid-connect/token',
       'request_content_type': 'data' # keycloak doesn't accept application/json body.
    }
}
  1. Database configured via UI: with following settings:
trino://<trino_url>:443/tpcds

{"connect_args":{"http_scheme":"https"}}

Impersonate: true

ADDITIONAL INFORMATION

  • [x] Has associated issue: https://github.com/apache/superset/issues/20300
  • [ ] Required feature flags:
  • [ ] Changes UI
  • [ ] Includes DB Migration (follow approval process in SIP-59)
    • [ ] Migration is atomic, supports rollback & is backwards-compatible
    • [ ] Confirm DB migration upgrade and downgrade tested
    • [ ] Runtime estimates and downtime expectations provided
  • [ ] Introduces new feature or API
  • [ ] Removes existing feature or API

Need feedback with:

We still need to trigger this OAuth2 dance in, at least, 2 contexts (I don't know much about superset, possibility there are more):

  • Automatic attempt to list schemas and tables
  • Testing Connection when adding the database: temp: previous OAuth2 features implemented and already merged don't include a way to trigger this flow when adding a connection via UI. For this reason, I had to hack the test_connection.py so I'm allowed to

joaoferrao avatar Aug 31 '24 15:08 joaoferrao

Codecov Report

:x: Patch coverage is 79.31034% with 6 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 83.91%. Comparing base (76d897e) to head (4011e69). :warning: Report is 2203 commits behind head on master.

Files with missing lines Patch % Lines
superset/db_engine_specs/trino.py 70.58% 5 Missing :warning:
superset/db_engine_specs/base.py 88.88% 1 Missing :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #30081       +/-   ##
===========================================
+ Coverage   60.48%   83.91%   +23.42%     
===========================================
  Files        1931      534     -1397     
  Lines       76236    38788    -37448     
  Branches     8568        0     -8568     
===========================================
- Hits        46114    32549    -13565     
+ Misses      28017     6239    -21778     
+ Partials     2105        0     -2105     
Flag Coverage Δ
hive 48.91% <48.27%> (-0.24%) :arrow_down:
javascript ?
mysql 76.73% <51.72%> (?)
postgres 76.85% <51.72%> (?)
presto 53.41% <48.27%> (-0.40%) :arrow_down:
python 83.91% <79.31%> (+20.40%) :arrow_up:
sqlite 76.31% <51.72%> (?)
unit 60.90% <79.31%> (+3.26%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Sep 03 '24 17:09 codecov[bot]

Looks like @betodealmeida was working on https://github.com/apache/superset/pull/30126 at the same time which might be related -- Beto, are you able to review this one?

sfirke avatar Sep 05 '24 14:09 sfirke

tagging @betodealmeida as he has most context to review this

mistercrunch avatar Sep 06 '24 19:09 mistercrunch

tried my luck fixing the merge conflict in the GitHub web editor, didn't pull the branch and run CI, so no pre-commit checks.... let's see if I get lucky, if not feel free to disregard my commit

mistercrunch avatar Oct 12 '24 00:10 mistercrunch

@villebro @betodealmeida looks like we have a few nits and unresolved questions here, but the thread has gone quiet. If @joaoferrao doesn't respond, would anyone want to block this, or should someone smash that merge button?

rusackas avatar Oct 29 '24 18:10 rusackas

(note that code suggestions can be committed here by the person offering them, btw!)

rusackas avatar Oct 29 '24 18:10 rusackas

Sorry, I will take a look at the comments over the coming days. I’ve been rather busy so I haven’t had time to refocus attention again.

joaoferrao avatar Oct 29 '24 19:10 joaoferrao

@betodealmeida @villebro I have addressed the main comments, I believe.

One question I still made when opening my PR is still open:

Testing Connection when adding the database: temp: previous OAuth2 features implemented and already merged don't include a way to trigger this flow when adding a connection via UI. For this reason, I had to hack the test_connection.py so I'm allowed to

I don't know enough about superset, but should this be addressed in a separate PR? Basically, I had to do a lot of hacking to be able to add the connection without the test connection working in the UI - ofc so I could develop the functionality and actually prove the connection it self works.

joaoferrao avatar Oct 30 '24 12:10 joaoferrao

I’ve been working on setting up OAuth2.0 integration for Trino with Okta in Superset, following the guidance from this PR (#30081). However, despite configuring authorization_request_uri, token_request_uri, and handling impersonation, I’m still encountering issues with 401 Unauthorized errors when trying to connect to Trino. My current setup looks like this:

trino_oauth

DATABASE_OAUTH2_REDIRECT_URI: "http://localhost:8088/api/v1/database/oauth2/" DATABASE_OAUTH2_CLIENTS: { 'HSH Trino Oauth': { 'id': '<trino_app_client_id>', 'secret': '<trino_app_client_secret>', 'scope': 'openid email offline_access roles profile', 'redirect_uri': 'http://localhost:8088/api/v1/database/oauth2/', 'authorization_request_uri': 'https://example.okta.com/oauth2/v1/authorize', 'token_request_uri': 'https://example.okta.com/oauth2/v1/token' } }

{"connect_args":{"http_scheme":"https"}}

Impersonate: true is in place as well and UI shows the correct configs.

I've reviewed the setup and ensured that the OAuth tokens are obtained correctly from Okta, and all necessary permissions seem in place. Are there any additional configurations or specifics for Trino that this PR setup might require? Any guidance or insights on further troubleshooting would be greatly appreciated!

Thank you!

anushreegawdeAP avatar Oct 30 '24 21:10 anushreegawdeAP

I've reviewed the setup and ensured that the OAuth tokens are obtained correctly from Okta, and all necessary permissions seem in place. Are there any additional configurations or specifics for Trino that this PR setup might require? Any guidance or insights on further troubleshooting would be greatly appreciated!

A bit more detail would help fine tune the issue. Can I ask if you, even if redundantly:

  1. Have you confirmed trino is working with OAuth2 without superset? Meaning: trino cli with trino --server='<your trino endpoint uri>' --external-authentication=true?
  2. Does that work with the exact same user your are trying to impersonate from superset?
  3. Have you configured superset to use the same "username" as you are using for trino?

If you are sure you are getting the access token back from the IdP (after being requested to authenticate), then this is what is sent to trino:

        if backend_name == "trino" and username is not None:
            connect_args["user"] = username
            if access_token is not None:
                http_session = requests.Session()
                http_session.headers.update({"Authorization": f"Bearer {access_token}"})
                connect_args["http_session"] = http_session

joaoferrao avatar Oct 30 '24 22:10 joaoferrao

@joaoferrao the first part works. I verified that. I am unclear about the username part. Can you shed some more light on that? Have you configured superset to use the same "username" as you are using for trino? how do i verify this?

anushreegawdeAP avatar Oct 31 '24 00:10 anushreegawdeAP

I don't know enough about superset, but should this be addressed in a separate PR? Basically, I had to do a lot of hacking to be able to add the connection without the test connection working in the UI - ofc so I could develop the functionality and actually prove the connection it self works.

Yeah, this was a chicken-and-egg problem. The token is stored associated with the database ID, so you can't do OAuth2 before the database is created. But for the database to be created, we require a successful connection, which in this case is impossible.

I added some logic in https://github.com/apache/superset/pull/30071 for Snowflake so that if the test connection fails because OAuth2 is needed then the test connection command succeeds:

https://github.com/apache/superset/pull/30071/files#diff-627a6d549d121af6805862d2edcf1fbbbc68c6df72013a81e67cc8f771b85717R166-R171

Wasn't this logic enough for the same flow with Trino?

betodealmeida avatar Oct 31 '24 18:10 betodealmeida

@betodealmeida that is indeed very similar to what I did for me to test the feature, just that I didn't publish it, because I didn't delve enough to the test_connection and it would impact more than just trino.

Thanks for the feedback. I see all checks are passed and my PR is still approved. Do I assume correctly I should merge it?

joaoferrao avatar Oct 31 '24 19:10 joaoferrao

@joaoferrao the first part works. I verified that. I am unclear about the username part. Can you shed some more light on that? Have you configured superset to use the same "username" as you are using for trino? how do i verify this?

Take what I'm saying with a grain of salt, but the point of impersonation is for your superset user to be impersonated upon the request to trino. Whatever the username is for superset, is going to be used in the --user argument of the trino connection. Currently, this username, depending on your Superset settings will either your Superset username or everything in your email address up until the @.

joaoferrao avatar Oct 31 '24 19:10 joaoferrao

@joaoferrao I did verify the command trino --server='' --external-authentication=true with my okta username and that works on the browser but throws cannot impersonate user error in the end on the cluster. Tried with making that setting false still no luck. Will adding 'request_content_type': 'application/x-www-form-urlencoded' make any difference instead of data? i suspect its not retrieving the token correctly. I have also checked a snippet of log and The log does not show an Authorization: Bearer <access_token> header in the request.

anushreegawdeAP avatar Oct 31 '24 23:10 anushreegawdeAP

and that works on the browser but throws cannot impersonate user error in the end on the cluster

Does it or doesn't in work when you remove superset from the equation? The previous comment left me a bit confused about the scenario you described.

  • This should work with standalone trino + okta: trino --server='<your trino endpoint uri>' --external-authentication=true --user=<username_used_by_superset> (notice the last argument) + execute some query such as SHOW CATLAOGS; to confirm you have permissions to see the catalogs. This username must be the same that is injected by superset in the query. If it doesn't work, you may have a problem with Trino setup.
  • adding 'request_content_type': 'application/x-www-form-urlencoded': this is what data does. This is used for the request from superset to okta to retrieve the tokens. This doesn't affect impersonation directly.
  • You mentioned in your initial comment that "I've reviewed the setup and ensured that the OAuth tokens are obtained correctly from Okta": this would mean that superset is correctly retrieving the token from okta, so there should be no issue with the data type of the request.

joaoferrao avatar Nov 01 '24 10:11 joaoferrao

@joaoferrao thanks for all the pointers. Noticed that trino is using the email and superset uses okta token as the username hence the mismatch and 401. I will need to change the setup on either trino or superset side to get the authentication correct.

anushreegawdeAP avatar Nov 01 '24 16:11 anushreegawdeAP

Thanks for working on this, @joaoferrao!

betodealmeida avatar Nov 04 '24 16:11 betodealmeida

Testing Connection when adding the database: temp: previous OAuth2 features implemented and already merged don't include a way to trigger this flow when adding a connection via UI. For this reason, I had to hack the test_connection.py so I'm allowed to

Hello @joaoferrao, do you have a solution to create the connection to Trino with oauth 2 enabled in a production environment ?

datasc24 avatar Dec 09 '24 14:12 datasc24

@joaoferrao @betodealmeida greetings!

I asked sometime ago some questions about setting up OAuth2.0 integration for Trino with Okta in Superset, following the guidance from this PR (https://github.com/apache/superset/pull/30081). We have got past the username part and now superset recognizes email as username. I am having trouble now setting up the database connection. It keeps throwing unauthorized error. My current setup looks like this:

trino_oauth DATABASE_OAUTH2_REDIRECT_URI = "http://localhost:8088/api/v1/database/oauth2/" DATABASE_OAUTH2_JWT_ALGORITHM = "RS256" DATABASE_OAUTH2_CLIENTS = { 'HSH Trino Oauth': { 'id': '<trino_app_client_id>', 'secret': '<trino_app_client_secret>', 'scope': 'openid email offline_access roles profile', 'redirect_uri': 'http://localhost:8088/api/v1/database/oauth2/', 'authorization_request_uri': 'https://example.okta.com/oauth2/v1/authorize', 'token_request_uri': 'https://example.okta.com/oauth2/v1/token' 'request_content_type': 'json' } }

{"connect_args":{"http_scheme":"https"}}

Impersonate: true is in place as well and UI shows the correct configs. Any guidance or insights on further troubleshooting would be greatly appreciated!

Thank you!

anushreegawdeAP avatar Mar 04 '25 21:03 anushreegawdeAP