vault icon indicating copy to clipboard operation
vault copied to clipboard

Vault OIDC flow breaks fails to prompt users to login when their client tokens have expired

Open BrandonIngalls opened this issue 1 year ago • 15 comments

Describe the bug

In older versions of vault a user would be prompted to log back into Vault when their Web UI client token had expired.

Newer versions of vault do not prompt users to log back in, instead they just send an error message back to the oauth client that initiated the OIDC auth flow.

This is annoying because I have to tell those using our vault deployment that they need to go and manually log into vault before they can use any application we have OIDC enabled on.

To Reproduce Steps to reproduce the behavior:

Run two versions of vault:

  1. a version of vault where this was not an issue: 1.15.6
  2. a recent version of vault where OIDC flows are busted: 1.17.2 (I think it broke in 1.16)

I have an example docker-compose.yml that I have been using to reproduce this issue

I also have a terraform file (main.tf) that I use to quickly configure the dev vault instances in a similar way.

This terraform state does the following:

  1. setup a vault user + userpass
  2. configure vault to allow CORS from an OIDC Debugging service (https://oidcdebugger.com/)
  3. add a public oidc client for the OIDC debugger

For my testing I do the following:

# start the two versions of vault
[~]$ docker compose up

# configure terraform
[~]$    terraform init \
; terraform workspace new broken \
; terraform workspace new working

# bootstrap the older version of vault that wasn't busted (v1.15.6)
[~]$ terraform workspace select working

[~]$ terraform apply -var=vault_addr=http://localhost:8200

# bootstrap a more modern version where vault appears busted
[~]$ terraform workspace select broken

[~]$ terraform apply -var=vault_addr=http://localhost:8201

I then open two tabs of https://oidcdebugger.com/ and fill out the required information to test out the OIDC provider feature of Vault.

I then login to the vault ui for both versions using the test creds user:password

and manually revoke their web ui client tokens (this is so I don't need to wait for them to naturally expire)

[~]$ vault token revoke \
  -address=http://localhost:8200 \
    hvs....

[~]$ vault token revoke \
  -address=http://localhost:8201 \
    hvs....

I then start the OIDC flow in both of my OIDC debugger tabs.

You'll see that the older version of vault will prompt you to login again, and if you enter the test creds again vault will complete the OIDC flow and the OIDC debugger will be happy.

In the modern version of Vault the vault ui will flash by in an instant as you are imidetily redirected back to the OIDC debugger with an error saying your client token is not valid.

Expected behavior Vault should prompt the user to re authenticate to vault when an OIDC client redirects a user to Vault to complete an auth flow and their vault web ui client token has expired like it did in previous versions of Vault.

Environment:

  • Vault Server Version (retrieve with vault status): 1.17.2
  • Vault CLI Version (retrieve with vault version): 1.17.2
  • Server Operating System/Architecture: Linux

Vault server configuration file(s): -dev is enough to showcase this issue.

Additional context n/a

BrandonIngalls avatar Jul 12 '24 19:07 BrandonIngalls

I can confirm that this is still a problem on 1.17.5

thanks4wifi avatar Sep 11 '24 20:09 thanks4wifi

still a problem on 1.17.6

diogoserrano avatar Sep 26 '24 09:09 diogoserrano

still a problem on 1.18.0. if anyone wants to use the oidc feature, needs to use a version before 1.17.x

diogoserrano avatar Oct 09 '24 09:10 diogoserrano

still a problem on 1.18.0. if anyone wants to use the oidc feature, needs to use a version before 1.17.x

Happening to us too, since the update

lcpandrade avatar Oct 09 '24 09:10 lcpandrade

its been months with this regression, so we changed our oidc provider to keycloak, not an issue anymore for us.

diogoserrano avatar Oct 18 '24 08:10 diogoserrano

its been months with this regression, so we changed our oidc provider to keycloak, not an issue anymore for us.

Thanks for the suggestion, keycloak looks like it will meet our needs too since Hashi is not supporting their OIDC feature.

BrandonIngalls avatar Oct 18 '24 12:10 BrandonIngalls

Nice repro steps!

Looks like this was indeed broken in 1.17.0, specifically by adding rather innocent condition in a MR.

Removing the problematic condition fixes the OIDC issue (and maybe introduces some new ones), if applied right after the problematic MR, but unfortunately doesn't work with current main branch. It starts working again on the HEAD of main if I remove all conditions in the code (~ 5) related to logical.ErrInvalidToken and revert the original MR. Changes uploaded oidc-fix branch

I suspect that there are some conditions in place that go crazy when they encounter different errors than those thrown in the past. I'll try to dig deeper into that

dezeroku avatar Oct 18 '24 22:10 dezeroku

Ok, so it looks like frontend and backend have different expectations of how the auth response should be formatted. This is visible for requests to the following endpoints (at least):

  • /v1/identity/oidc/provider/default/authorize
  • /v1/sys/internal/ui/resultant-acl

When auth fails, backend responds with 403 and attaches JSON response describing exact errors that occurred. So for the "correct" versions <1.17.0 we have response that looks like this:

errors: [
    "permission denied"
]

On the frontend side we iterate over the list of returned errors and if we find permission denied string a redirect to auth website is performed. This can be seen in code here

Now, with this PR a new error 'invalid token' was introduced to differentiate between "you have no permissions to access this endpoint" and "your token is expired/does not exist". With this change in place, the returned error list actually contains two errors:

  • "permission denied", same as in the past
  • "invalid token" because, well, in our case this token actually IS invalid as it's either expired or manually revoked by us for testing

This would work fine if the returned errors were formatted as a proper list, but they're not. For Vault versions >=1.17.0 we have response that looks like this:

"errors": [
	"2 errors occurred:\n\t* permission denied\n\t* invalid token\n\n"
]

As you can see, the returned error "list" is actually a single string that contains two errors and some boilerplate. This string follows the default format used by go-multierror package.

There are two ways to proceed here:

  1. properly format the output on backend's side (as a list)
  2. align on the frontend

I can't really comment on how big the change would be on backend, but on frontend a naive "fix" of matching the received errors "list" with regex and not exact match takes one line and seems to work fine: https://github.com/hashicorp/vault/commit/262da983094b64711692cb34e1bf3b3ceacfecae

dezeroku avatar Oct 19 '24 04:10 dezeroku

Came here from #28792, and want to point out that it affects any user of the API, not just the OIDC flow.

Because errors are returned as strings in the API, the formatting and content implicitly becomes part of the API contract. I have a tool which is (reasonably, IMO) looking for the exact string permission denied in the JSON errors array, and the change broke that. As a workaround we now do strings.Contains(err, "permission denied") but this is a generally imprecise check that I don't like having.

So a fix to the frontend would only fix the narrow issue of the OIDC flow but not the fact that this is effectively a breaking change to the Vault HTTP API.

joeshaw avatar Oct 30 '24 12:10 joeshaw

Hi @mpalmi, I saw your comment on #28792 that you will work on this. Have you had a chance to look into this?

gnadaban avatar Dec 10 '24 17:12 gnadaban

Thanks for checking in on this one, folks. While we're first planning to do a quick fix to the UI to get OIDC folks unblocked, we understand that it's just a band-aid and we need to dedicate some time to thinking through the API response (keeping in mind backwards compatibility) as it's obviously been pretty frustrating. We'll work on prioritizing the API response fix with the team in the short term as much as possible, but we unfortunately cannot promise an exact timeline yet due to other competing priorities.

digivava avatar Jan 29 '25 00:01 digivava

Hi to everyone, I can confirm after vault update using container stil exist the same issue, we are using 1.18.4 and have to rollback to 1.15.4 to keep running again oidc.

apenadiazApk avatar Feb 20 '25 15:02 apenadiazApk

Coming up on a year of Vault's OIDC feature being broken (v1.16.0 was tagged 2024-03-26).

BrandonIngalls avatar Mar 23 '25 20:03 BrandonIngalls

@digivava are y'all still planning on releasing a quick fix in the UI for this issue? Vault's OIDC capability is still broken to this day.

thanks4wifi avatar May 23 '25 15:05 thanks4wifi

Thanks everyone for your patience. I'll get this back in front of our engineering teams next week.

heatherezell avatar May 23 '25 20:05 heatherezell

Unfortunately this doesn't fix the underlying issue, which is that the Vault server is buggy and returning bad error messages. See #28792 for that, which was marked as a duplicate of this. I feel that we should reopen one of these two issues.

joeshaw avatar Jun 04 '25 21:06 joeshaw

For others following along, we do understand that this doesn't resolve the issue at the API level and just fixes it for the Vault UI. Please see this comment for the reasoning.

hellobontempo avatar Jun 05 '25 19:06 hellobontempo