argo-workflows
argo-workflows copied to clipboard
Workflow API response failure due to RBAC restrictions
Summary
If we are not authorized for a particular operation according to RBAC, the Workflow API will return an HTTP status 500 error. Not authorized, we expect the API to return a HTTP status 4xx error such as 403.
Server administrators often monitor the API for HTTP status errors at 5xx. However, since an unauthorized request is normal for the server, it is a false positive and turns into "The Boy Who Cried Wolf".
version: v3.3.8
Diagnostics
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: ui-readonly
rules:
- apiGroups:
- ""
resources:
- pods
- pods/log
- events
verbs:
- get
- list
- watch
- apiGroups:
- argoproj.io
resources:
- workflows
- workfloweventbindings
- workflowtemplates
- cronworkflows
- cronworkflows/finalizers
- clusterworkflowtemplates
verbs:
- get
- list
- watch
For example, if the above privileges are assigned to a service account, the user using that service account will not be able to retry the workflow.
When the user performs a retry operation from UI, the following API was called:
PUT /api/v1/workflows/namespace/workflow-name/retry
The response to this would be a HTTP status of 500.
In this case, the error log for argocli is as follows.
time="2022-07-27T00:04:43.918Z" level=error msg="finished unary call with code Unknown" error="pods \"sample-workflow-pod-name\" is forbidden: User \"system:serviceaccount:argo:ui-readonly\" cannot delete resource \"pods\" in API group \"\" in the namespace \"sample-namespace\"" grpc.code=Unknown grpc.method=RetryWorkflow grpc.service=workflow.WorkflowService grpc.start_time="2022-07-27T00:04:43Z" grpc.time_ms=49.319 span.kind=server system=grpc
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
@kenchan0130 Make sense, Current API server is returning an error to the GRPC server and HTTP Mux is using DefaultHTTPProtoErrorHandler
which is sending 500 for all errors.
Two ways to fix it:
- Return status.Error(code, message) instead of an error. The util function will intercept the error return status. error. We need to change all handles
- Add custom
HTTPProtoErrorHandler
to intercept all errors and return the appropriate error. code
@alexec Please comment on this. @kenchan0130 Do you like to contribute to this fix?
I find it difficult for me to work on my contribution as it would affect all APIs and would require design skills.
@alexec Please comment on this.
Seems good to me.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.
This is still not stale as far as I am aware.
@kenchan0130 Do you like to fix and submit the PR?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.