argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

Workflow API response failure due to RBAC restrictions

Open kenchan0130 opened this issue 1 year ago • 6 comments

Summary

If we are not authorized for a particular operation according to RBAC, the Workflow API will return an HTTP status 500 error. Not authorized, we expect the API to return a HTTP status 4xx error such as 403.

Server administrators often monitor the API for HTTP status errors at 5xx. However, since an unauthorized request is normal for the server, it is a false positive and turns into "The Boy Who Cried Wolf".

version: v3.3.8

Diagnostics

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: ui-readonly
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - pods/log
  - events
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - argoproj.io
  resources:
  - workflows
  - workfloweventbindings
  - workflowtemplates
  - cronworkflows
  - cronworkflows/finalizers
  - clusterworkflowtemplates
  verbs:
  - get
  - list
  - watch

For example, if the above privileges are assigned to a service account, the user using that service account will not be able to retry the workflow.

When the user performs a retry operation from UI, the following API was called:

PUT /api/v1/workflows/namespace/workflow-name/retry

The response to this would be a HTTP status of 500.

In this case, the error log for argocli is as follows.

time="2022-07-27T00:04:43.918Z" level=error msg="finished unary call with code Unknown" error="pods \"sample-workflow-pod-name\" is forbidden: User \"system:serviceaccount:argo:ui-readonly\" cannot delete resource \"pods\" in API group \"\" in the namespace \"sample-namespace\"" grpc.code=Unknown grpc.method=RetryWorkflow grpc.service=workflow.WorkflowService grpc.start_time="2022-07-27T00:04:43Z" grpc.time_ms=49.319 span.kind=server system=grpc

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

kenchan0130 avatar Jul 27 '22 05:07 kenchan0130

@kenchan0130 Make sense, Current API server is returning an error to the GRPC server and HTTP Mux is using DefaultHTTPProtoErrorHandler which is sending 500 for all errors. Two ways to fix it:

  1. Return status.Error(code, message) instead of an error. The util function will intercept the error return status. error. We need to change all handles
  2. Add custom HTTPProtoErrorHandler to intercept all errors and return the appropriate error. code

@alexec Please comment on this. @kenchan0130 Do you like to contribute to this fix?

sarabala1979 avatar Jul 27 '22 15:07 sarabala1979

I find it difficult for me to work on my contribution as it would affect all APIs and would require design skills.

kenchan0130 avatar Jul 28 '22 15:07 kenchan0130

@alexec Please comment on this.

Seems good to me.

alexec avatar Aug 02 '22 17:08 alexec

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

stale[bot] avatar Oct 01 '22 17:10 stale[bot]

This is still not stale as far as I am aware.

kenchan0130 avatar Oct 03 '22 13:10 kenchan0130

@kenchan0130 Do you like to fix and submit the PR?

sarabala1979 avatar Oct 03 '22 18:10 sarabala1979

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

stale[bot] avatar Oct 29 '22 05:10 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

stale[bot] avatar Dec 31 '22 22:12 stale[bot]