[BUG] Service Account not working different namespace
Description
I use the helmchart of spark operator, it is deployed at the namespace spark-operator I configure on the helmrelease sparkJobNamespaces: spark-jobs that is the namespace where I want to run the jobs. However, I'm getting this error
Name: "pyspark-pi", Namespace: "spark-jobs"
from server for: "STDIN": sparkapplications.sparkoperator.k8s.io "pyspark-pi" is forbidden: User "system:serviceaccount:spark-jobs:spark-sa" cannot get resource "sparkapplications" in API group "sparkoperator.k8s.io" in the namespace "spark-jobs"
@devscheffer Could you provide detailed information about how you install the helm chart? Is this service account spark-sa created by helm or by yourself?
it is created by the helm.
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
labels:
app: spark-operator
name: spark-operator
namespace: spark-operator
spec:
chart:
spec:
chart: spark-operator
reconcileStrategy: ChartVersion
sourceRef:
kind: HelmRepository
name: spark-operator
version: 1.4.0
interval: 5m0s
releaseName: spark-operator
values:
image:
repository: docker.io/kubeflow/spark-operator
pullPolicy: IfNotPresent
tag: ""
rbac:
create: false
createRole: true
createClusterRole: true
annotations: {}
serviceAccounts:
spark:
create: true
name: "spark-sa"
annotations: {}
sparkoperator:
create: true
name: "spark-operator-sa"
annotations: {}
sparkJobNamespaces:
- spark-operator
- team-1
webhook:
enable: true
port: 443
portName: webhook
namespaceSelector: ""
timeout: 30
metrics:
enable: true
port: 10254
portName: metrics
endpoint: /metrics
prefix: ""
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
effect: "NoSchedule"
It works when I do manually through the terminal however when I execute from airflow I get this error from server for: "STDIN": sparkapplications.sparkoperator.k8s.io "pyspark-pi2" is forbidden: User "system:serviceaccount:team-1:spark-sa" cannot get resource "sparkapplications" in API group "sparkoperator.k8s.io" in the namespace "team-1"
here is the task in airflow
spark_kpo = KubernetesPodOperator(
task_id="kpo",
name="spark-app-submission",
namespace=namespace,
image="bitnami/kubectl:1.28.11",
cmds=["/bin/bash", "-c"],
arguments=[f"echo '{spark_app_manifest_content}' | kubectl apply -f -"],
in_cluster=True,
get_logs=True,
service_account_name=service_account_name,
on_finish_action="keep_pod",
)
```
@devscheffer The service account spark-sa actually does not have any permissions for SparkApplication, and it is used by spark driver pods. If you want to submit SparkApplication in airflow, you can configure the service account name to spark-operator-sa in KubernetesPodOperator instead. Or you can create a ServiceAccount manually and grant it with all permissions to SparkApplication.
Hello. I'd like to say, that I do have the same result. I deployed helm v2.0.2 like so:
helm install spark-operator ./spark-operator \
--version 2.0.2 \
--create-namespace \
--namespace spark-operator \
--set 'spark.jobNamespaces={,airflow}' \
--values ./values.yaml
With values.yaml for it was like:
nameOverride: ""
fullnameOverride: ""
commonLabels: {}
image:
registry: docker.io
repository: kubeflow/spark-operator
tag: ""
pullPolicy: IfNotPresent
pullSecrets: []
controller:
replicas: 1
workers: 10
logLevel: info
uiService:
enable: true
uiIngress:
enable: false
urlFormat: ""
batchScheduler:
enable: true
kubeSchedulerNames:
- volcano
default: ""
serviceAccount:
create: true
name: ""
annotations: {}
rbac:
create: true
annotations: {}
labels: {}
annotations: {}
volumes: []
nodeSelector: {}
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- test-node
tolerations:
- key: "airflow"
operator: "Equal"
value: "true"
effect: "NoSchedule"
priorityClassName: ""
podSecurityContext: {}
topologySpreadConstraints: []
env: []
envFrom: []
volumeMounts: []
resources: {}
securityContext: {}
sidecars: []
podDisruptionBudget:
enable: false
minAvailable: 1
pprof:
enable: false
port: 6060
portName: pprof
workqueueRateLimiter:
bucketQPS: 50
bucketSize: 500
maxDelay:
enable: true
duration: 6h
webhook:
enable: true
replicas: 1
logLevel: info
port: 9443
portName: webhook
failurePolicy: Fail
timeoutSeconds: 10
resourceQuotaEnforcement:
enable: false
serviceAccount:
create: true
name: ""
annotations: {}
rbac:
create: true
annotations: {}
labels: {}
annotations: {}
sidecars: []
volumes: []
nodeSelector: {}
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- test-node
tolerations:
- key: "airflow"
operator: "Equal"
value: "true"
effect: "NoSchedule"
priorityClassName: ""
podSecurityContext: {}
topologySpreadConstraints: []
env: []
envFrom: []
volumeMounts: []
resources: {}
securityContext: {}
podDisruptionBudget:
enable: false
minAvailable: 1
spark:
jobNamespaces:
- "airflow"
serviceAccount:
create: true
name: ""
annotations: {}
rbac:
create: true
annotations: {}
prometheus:
metrics:
enable: true
port: 8080
portName: metrics
endpoint: /metrics
prefix: ""
podMonitor:
create: true
labels: {}
jobLabel: spark-operator-podmonitor
podMetricsEndpoint:
scheme: http
interval: 5s
And right after that, if I run a DAG from Airflow, as a result I have a POD spark-submit which fails with the next error:
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': '35324a3b-9f01-4c3b-bf56-445ea8746423', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '8bae74e0-9f4b-483f-8878-77b94fe77097', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'b1662841-0cf0-4ed4-8ade-b34262bca683', 'Date': 'Fri, 18 Oct 2024 08:05:50 GMT', 'Content-Length': '483'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"sparkapplications.sparkoperator.k8s.io \"spark-submit-soyzhqvo\" is forbidden: User \"system:serviceaccount:transgran-spreads:airflow-worker\" cannot get resource \"sparkapplications/status\" in API group \"sparkoperator.k8s.io\" in the namespace \"airflow\"","reason":"Forbidden","details":{"name":"spark-submit-soyzhqvo","group":"sparkoperator.k8s.io","kind":"sparkapplications"},"code":403}
This can be fixed by adding:
- to airflow-pod-launcher-role (Role)
- apiGroups:
- sparkoperator.k8s.io
resources:
- '*'
verbs:
- '*'
- to spark-operator-spark (RoleBinding):
- kind: ServiceAccount
name: default
namespace: airflow
With all this above, I'd like to ask why this fixes wasn't added by the helm chart ?
I think I have the same issue. I installed spark-operator v2.0.0 using following command
helm install vikas spark-operator/spark-operator --version v2.0.0 \
--namespace spark-operator \
--create-namespace --set "sparkJobNamespaces={testvikas}"\
--set webhook.enable=true
namespace testvikas already exists but i didn't get a service account created for the same.
Below is the output of my kubectl get serviceaccounts command:
I was expecting testvikas-spark-operator-spark to be shown in the output