SSO RBAC in 3.4 with managed namespace works differently
Pre-requisites
- [X] I have double-checked my configuration
- [X] I can confirm the issues exists when I tested with
:latest - [ ] I'd like to contribute the fix myself (see contributing guide)
What happened/what you expected to happen?
I upgraded Argo Workflow from 3.1.13 to 3.4.3. SSO Authentication was working fine with 3.1.13; however, the 3.4.3 doesn't seem to work. The SSO configuration (Okta) has not changed.
When I tried to open the UI and click on login on the SSO, I get a red banner on the down right corner saying Failed to load version/info Error: Unauthorized. After that the web page just tries to load and after sometime it replies with test-ce-argo-server-integration.k8s.cnqr.tech didn't send any data.ERR_EMPTY_RESPONSE
Version
3.4.3
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
N/A - The issue happens at login time, so I can't run any workflow.
Logs from the workflow ~controller~ server
I am attaching the logs of the workflow server, because the error happens during authentication:
time="2022-11-07T13:37:32.115Z" level=info duration=2.752873ms method=GET path=/main.2430295409b8b54e52ad.js size=1471060 status=0
time="2022-11-07T13:37:32.117Z" level=info duration="23.156µs" method=GET path=index.html size=0 status=304
time="2022-11-07T13:37:33.935Z" level=info msg="finished unary call with code Unauthenticated" error="rpc error: code = Unauthenticated desc = token not valid for running mode" grpc.code=Unauthenticated grpc.method=GetUserInfo grpc.service=info.InfoService grpc.start_time="2022-11-07T13:37:33Z" grpc.time_ms=0.047 span.kind=server system=grpc
time="2022-11-07T13:37:33.935Z" level=info msg="finished unary call with code Unauthenticated" error="rpc error: code = Unauthenticated desc = token not valid for running mode" grpc.code=Unauthenticated grpc.method=GetInfo grpc.service=info.InfoService grpc.start_time="2022-11-07T13:37:33Z" grpc.time_ms=0.028 span.kind=server system=grpc
time="2022-11-07T13:37:33.935Z" level=info duration=1.628208ms method=GET path=/api/v1/userinfo size=56 status=401
time="2022-11-07T13:37:33.935Z" level=info duration=2.284157ms method=GET path=/api/v1/info size=56 status=401
time="2022-11-07T13:37:34.116Z" level=info duration="202.575µs" method=GET path=/assets/fonts/fa-solid-900.woff2 size=150472 status=0
time="2022-11-07T13:37:34.116Z" level=info duration="111.014µs" method=GET path=/assets/images/logo.png size=41464 status=0
time="2022-11-07T13:37:34.389Z" level=info msg="finished unary call with code Unauthenticated" error="rpc error: code = Unauthenticated desc = token not valid for running mode" grpc.code=Unauthenticated grpc.method=CollectEvent grpc.service=info.InfoService grpc.start_time="2022-11-07T13:37:34Z" grpc.time_ms=0.03 span.kind=server system=grpc
time="2022-11-07T13:37:34.389Z" level=info duration="438.543µs" method=POST path=/api/v1/tracking/event size=56 status=401
time="2022-11-07T13:37:36.819Z" level=info duration="67.668µs" method=GET path=index.html size=473 status=0
time="2022-11-07T13:37:56.819Z" level=info duration="68.792µs" method=GET path=index.html size=473 status=0
time="2022-11-07T13:38:16.819Z" level=info duration="74.393µs" method=GET path=index.html size=473 status=0
time="2022-11-07T13:38:36.819Z" level=info duration="83.064µs" method=GET path=index.html size=473 status=0
time="2022-11-07T13:38:56.819Z" level=info duration="68.599µs" method=GET path=index.html size=473 status=0
time="2022-11-07T13:39:16.819Z" level=info duration="81.42µs" method=GET path=index.html size=473 status=0
time="2022-11-07T13:39:36.819Z" level=info duration="72.698µs" method=GET path=index.html size=473 status=0
Logs from in your workflow's wait container
N/A
This is the service account configured for RBAC:
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
annotations:
# The rule is an expression used to determine if this service account
# should be used.
# * `groups` - an array of the OIDC groups
# * `iss` - the issuer ("argo-server")
# * `sub` - the subject (typically the username)
# Must evaluate to a boolean.
# If you want an account to be the default to use, this rule can be "true".
# Details of the expression language are available in
# https://github.com/antonmedv/expr/blob/master/docs/Language-Definition.md.
workflows.argoproj.io/rbac-rule: "true"
# The precedence is used to determine which service account to use whe
# Precedence is an integer. It may be negative. If omitted, it defaults to "0".
# Numerically higher values have higher precedence (not lower, which maybe
# counter-intuitive to you).
# If two rules match and have the same precedence, then which one used will
# be arbitrary.
workflows.argoproj.io/rbac-rule-precedence: "0"
Please note that this bug is present also with version 3.4.2. I rolled back to 3.1.13 and it's working again.
If I compare the logs, it looks like the issue is the 401 returned when calling the /api endpoints.
@sarabala1979 thanks for looking at it. Please let me know if you want to discuss it with a live demo. We can book some time and I can share with you what I see. Thank you.
it still works for me in 3.4.3 , I use Dex not okta
I thought that I was also affected by this issue or something similar. But for me the problem was running Kubernetes 1.25. Starting with Kubernetes 1.24 service account tokens are no longer generated automatically and I had to create an empty secret with appropriate annotation to get the token that Argo Workflows tries to read. See https://github.com/argoproj/argo-workflows/blob/master/docs/manually-create-secrets.md.
I was getting this error message in the server's logfile:
time="2022-11-14T17:12:21.485Z" level=error msg="failed to perform RBAC authorization" error="failed to get service account secret: secrets \"argo-workflows-server.service-account-token\" not found"
Leaving this note as it might help someone else who's searching through the issues.
Thanks for your update @elemental-lf - in my case I am using Kubernetes 1.19 so I shouldn't be affected. But thanks for pointing this out, I'd missed this info in my initial post.
The red banner appears before you’re logged into. Can you try deleting cookies and logging back in? Ignore the banner.
I actually did this by using Chrome in Incognito mode and I get the Okta page back. But once I try to login, it just spins and then it returns test-ce-argo-server-integration.k8s.cnqr.tech didn't send any data.ERR_EMPTY_RESPONSE
We haven't changed anything on the Okta side, so I guess we are missing something in the request?
I think this might be fixed by #10046
My k8s version is: v1.23.10. And using argo server latest image with digest sha256:744501b36420f42eb33628206449bce4654604046baf19b193cbae4b25621291. I am still stuck on this issue.
My SSO server is the Argo CD dex.
The browse reports 401 with /api/v1/userinfo
I thought that I was also affected by this issue or something similar. But for me the problem was running Kubernetes 1.25. Starting with Kubernetes 1.24 service account tokens are no longer generated automatically and I had to create an empty secret with appropriate annotation to get the token that Argo Workflows tries to read. See https://github.com/argoproj/argo-workflows/blob/master/docs/manually-create-secrets.md.
I was getting this error message in the server's logfile:
time="2022-11-14T17:12:21.485Z" level=error msg="failed to perform RBAC authorization" error="failed to get service account secret: secrets \"argo-workflows-server.service-account-token\" not found"Leaving this note as it might help someone else who's searching through the issues.
@LinuxSuRen could this be your case?
I don't how. But it works now. Thank @vitalyrychkov
FYI- SSO seems to work in v3.4.4 in single namespace but not managed namespace mode. This makes me skeptical about http proxy fix. I have not checked cluster install. The "latest" images for workflows do not seem to fix this issue yet. Have http proxy fix been included in the latest image?
It worked in v3.3.5
I think this might be fixed by #10046
@simox-83 Were you able to verify that this fix resolved your issue? What is install mode you have: namespace, cluster or managed namespace?
OK. Here's the issue, I think, which has nothing to do with proxy.
In v3.3.5 I have been able to configure SSO RBAC by defining role and binding in target namespace to annotated service account in server namespace and it worked. This no longer works in v3.4.4.
In v3.4.4 I have to to configure SSO RBAC by defining role and binding in target namespace to annotated service account ALSO in target namespace instead of server namespace. This SSO RBAC configuration does not work in v3.3.5
Whether or not I defined SSO_DELEGATE_RBAC_TO_NAMESPACE=true had no bearing in either case.
@simox-83 Can you confirm that this has been resolved in the latest versions? We might be able to patch 3.3, but it's unlikely. We want to make sure it was fixed by #10046 and is working in 3.4
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.
OK. Here's the issue, I think, which has nothing to do with proxy.
In v3.3.5 I have been able to configure SSO RBAC by defining role and binding in target namespace to annotated service account in server namespace and it worked. This no longer works in v3.4.4.
In v3.4.4 I have to to configure SSO RBAC by defining role and binding in target namespace to annotated service account ALSO in target namespace instead of server namespace. This SSO RBAC configuration does not work in v3.3.5
Whether or not I defined
SSO_DELEGATE_RBAC_TO_NAMESPACE=truehad no bearing in either case.
I faced same issue and we are not using proxy. We use managed namespace and we had to move the service account and the bindings from argo server namespace to the managed namespace in order to make it work for upgrading from 3.3 to 3.4.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.
Not stale. Needs fixing
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
This issue seems to have become a hodgepodge collection of different SSO configuration issues, which is hard to be actionable and often missing reproduction details. As such I'm inclined to close this out. If you have a specific SSO issue, please file a new bug report with a reproducible configuration showing the bug.
-
I think this might be fixed by #10046
OP's issue might have been fixed by this proxy change. OP never responded. But if they weren't sure, then it's hard to say what the root cause was to begin with.
-
But for me the problem was running Kubernetes 1.25. Starting with Kubernetes 1.24 service account tokens are no longer generated automatically and I had to create an empty secret with appropriate annotation to get the token that Argo Workflows tries to read. See https://github.com/argoproj/argo-workflows/blob/master/docs/manually-create-secrets.md.
I was getting this error message in the server's logfile:
time="2022-11-14T17:12:21.485Z" level=error msg="failed to perform RBAC authorization" error="failed to get service account secret: secrets \"argo-workflows-server.service-account-token\" not found"This appears to have been the problem for several people in this thread as well, and is unrelated to OP. The SA Secrets docs are now here (permalink): https://argo-workflows.readthedocs.io/en/release-3.5/service-account-secrets/
-
The red banner appears before you’re logged into. Can you try deleting cookies and logging back in? Ignore the banner.
This is also a common issue. The banner and error message is not indicative of the root cause.
More logging was added in #11370, so if you get this and think it may be due to a misconfiguration and not just an invalid or expired token, check your Server logs preceding this error.
We may remove this banner message due to being too generic and sometimes counter-productive per #12070 and #12168. I need to investigate more if we can possibly disambiguate the error better at that phase or prior (most SSO errors happen during the callback which precedes the login, hence the preceding logs mentioned above).
-
In v3.4.4 I have to to configure SSO RBAC by defining role and binding in target namespace to annotated service account ALSO in target namespace instead of server namespace. This SSO RBAC configuration does not work in v3.3.5
Whether or not I defined
SSO_DELEGATE_RBAC_TO_NAMESPACE=truehad no bearing in either case.I faced same issue and we are not using proxy. We use managed namespace and we had to move the service account and the bindings from argo server namespace to the managed namespace in order to make it work for upgrading from
3.3to3.4.This managed namespace change -- without delegation -- sounds like a potential regression. I couldn't find in the 3.4 changelog where that might have happened though, nor by looking through the code. From the
blame, only thing I can think of off the top of my head is that #8555 maybe had a bug?Problematically, that appears to have also been a breaking change, one that has persisted to 3.5 too 😕. Fixing that would result in another breaking change 😕
From the
blame, only thing I can think of off the top of my head is that #8555 maybe had a bug?Problematically, that appears to have also been a breaking change, one that has persisted to 3.5 too 😕. Fixing that may result in another breaking change 😕
Yep, that PR appears to have caused a completely undocumented breaking change regression 😕 See my comments in https://github.com/argoproj/argo-workflows/pull/8555#discussion_r1579963621
That is pretty confusing behavior for managed namespaces, so I'm inclined to change it back... but two breaking changes are not great either...
We could patch both 3.4.x and 3.5.x, but it'd be a breaking patch then... 😕
Discussed in today's Contributor Meeting and the consensus was that we would add a note to the 3.4 upgrading guide about this unintentional bug / breaking change to SSO RBAC with managed namespaces, and then fix it in 3.6 with another note. Since this bug has existed for a while now (the entirety of 3.4 and 3.5), we don't want to break folks again in a patch release, so doing it in a minor will make things more clear
I'm a bit confused as to the current state of this feature. The issue is marked with "solution/workaround", but I don't think I understand. I can't seem to get namespace delegation to work when I put the service accounts into the rbac managed namespaces. It only works when my service accounts are in the workflow server namespace.
I can't seem to get namespace delegation
Yes, namespace delegation specifically still works correctly. But if you turn it off and have a managed namespace in 3.4 or 3.5, your SAs will still have to be in the managed namespace. (and moving them there is a workaround). See also my PR comment as the most clear, isolated comment: https://github.com/argoproj/argo-workflows/pull/8555#discussion_r1579963621
Or also, from an earlier comment above:
Whether or not I defined
SSO_DELEGATE_RBAC_TO_NAMESPACE=truehad no bearing in either case.
^That should not be the case and is a bug.
(also I deleted the comment I made a few min before this as I misread)
You also might be able to workaround it by removing the managed namespace flag of the Server and make it cluster-level, but keep its RBAC only for the managed namespace. I haven't tried that though
Hey @agilgur5 thank you for the quick response. I was able to get it working with quite a bit of troubleshooting of some issues that were mostly due to my helm charts. I'll share my issues here in case it helps anyone else.
- I didn't have the annotation
workflows.argoproj.io/service-account-token.namepresent on the service accounts - The namespace rolebinding needs to be placed in the managed namespace via the
namespacemetadata flag and must reference the server service account via its namespace in the "subjects" reference (Potentially obvious, but I missed a field) - This was a tricky one. My helm templates were turning the
workflows.argoproj.io/rbac-ruleinto double single quotes, i.e.'''engineering_infra_platform'' in groups'instead of"engineering_infra_platform" in groups. Make sure this is configured exactly. Oddly, this had previously been working in the non-namespaced mode. - Another helm templating mistake. The
workflows.argoproj.io/rbac-rule-precedenceannotation was being rendered as"100"vs100. This was due to a refactor I made where I was placing the value in a dictionary and referencing it. What was particularly tricky was that it did not show up any differently in k9s via the default view. It only showed up when I described the object.