Create a login flow via pyDataverse to retrieve an API Token via Dataverse/a browser
In https://dataverse.zulipchat.com/#narrow/stream/377090-python/topic/auth.20options, we discussed different options to ease the access to Dataverse with pyDataverse / CLI / API clients.
One of the options is similar to kubectl's and nomad's authentication mechanisms: You open a browser to retrieve a (bearer) token, login to your favorite OIDC provider with a callback to localhost, and get the token passed to a temporarily started local webserver.
In this issue we want to track ideas about how to make it this work.
Yesterday, @JR-1991 and I had a productive programming session around the OIDC login. It seems that (at least in the example setup) Dataverse does not store a fixed redirect_uri in Keycloak or that it is at least not checked by Keycloak. This should allow us to "intercept" the login flow (not in a bad way, don't get me wrong) and retrieve the cookie response with the mechanism which I outlined above and which we discussed in Zulip.
When doing a login via OIDC on Dataverse via the browser, the process is roughly like this:
sequenceDiagram
participant B as Browser
participant D as Dataverse
participant K as Keycloak/IdP
B->>D: requests OIDC information
D->>B: returns auth URL and start code as <a href>
B->>K: opens auth URL
K->>B: returns login page
B->>K: submits credentials
K->>B: returns response including result code with redirect to redirect URI
B->>D: follows redirect, passing result code
D->>K: passes result code
K->>D: returns identity information as JWT
D->>B: responds with Cookie containing JWT
We managed to basically replace the browser for some of the steps, allowing us to a) find out the Dataverse Client ID to tell Keycloak for which client we require credentials and b) retrieve the Cookie Dataverse sets so that we get a Bearer token and not the browser.
sequenceDiagram
participant P as pyDataverse
participant B as Browser
participant D as Dataverse
participant K as Keycloak/IdP
P->>D: requests OIDC information
D->>P: returns auth URL and start code as <a href>
P->>P: rewrites redirect URL
create participant L as local server
P->>L: starts local server
P->>B: opens auth URL in browser
B->>K: opens auth URL
K->>B: returns login page
B->>K: submits credentials
K->>B: returns response including result code with redirect to redirect URI
B->>L: follows redirect, passing result code
L->>P: passes result code
destroy L
P->>L: stops local server
P->>D: passes result code
D->>K: passes result code
K->>D: returns identity information as JWT
D->>P: responds with Cookie containing JWT
We got a toy example for this flow working, but a few open questions remain:
- The auth URL / client ID is embedded in some lazy-loaded partial HTML document, which we need to retrieve somehow. Maybe Dataverse can provide an API to retrieve the required information
- What if multiple OIDC providers are configured for Dataverse? How to select the correct one? Maybe we can go a step further than having Dataverse provide the auth URL, but instead cooperate to do the login for us and pass down the cookie via a redirect URL – but I am not sure that's a good idea. I guess if there's an API, one could simply let the user decide and fallback to the first or so.
- Where to store the Cookie/JWT? We don't want to perform the full ping pong for every requests, that'd defeat the purpose. Only in memory (only makes limited sense for interactive applications)? Current working directory? In ~/.config/XDG_CONFIG_HOME? In a credential manager? Let the user decide?
- What about OIDC providers where the callback_uri is stored? Should we maybe pick and document a specific port to open such that a
http://localhost:PORTcan be used? Is there a convention or spec for those local server ports?
Anyways, this was a very productive session and we plan to continue with an actual implementation for pyDataverse next week.