spark-operator icon indicating copy to clipboard operation
spark-operator copied to clipboard

[BUG] Service Account not working different namespace

Open devscheffer opened this issue 1 year ago • 4 comments

Description

I use the helmchart of spark operator, it is deployed at the namespace spark-operator I configure on the helmrelease sparkJobNamespaces: spark-jobs that is the namespace where I want to run the jobs. However, I'm getting this error

Name: "pyspark-pi", Namespace: "spark-jobs"
from server for: "STDIN": sparkapplications.sparkoperator.k8s.io "pyspark-pi" is forbidden: User "system:serviceaccount:spark-jobs:spark-sa" cannot get resource "sparkapplications" in API group "sparkoperator.k8s.io" in the namespace "spark-jobs"

devscheffer avatar Jul 26 '24 19:07 devscheffer

@devscheffer Could you provide detailed information about how you install the helm chart? Is this service account spark-sa created by helm or by yourself?

ChenYi015 avatar Jul 27 '24 01:07 ChenYi015

it is created by the helm.

---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  labels:
    app: spark-operator
  name: spark-operator
  namespace: spark-operator
spec:
  chart:
    spec:
      chart: spark-operator
      reconcileStrategy: ChartVersion
      sourceRef:
        kind: HelmRepository
        name: spark-operator
      version: 1.4.0
  interval: 5m0s
  releaseName: spark-operator
  values:
    image:
      repository: docker.io/kubeflow/spark-operator
      pullPolicy: IfNotPresent
      tag: ""
    rbac:
      create: false
      createRole: true
      createClusterRole: true
      annotations: {}
    serviceAccounts:
      spark:
        create: true
        name: "spark-sa"
        annotations: {}
      sparkoperator:
        create: true
        name: "spark-operator-sa"
        annotations: {}
    sparkJobNamespaces:
      - spark-operator
      - team-1
    webhook:
      enable: true
      port: 443
      portName: webhook
      namespaceSelector: ""
      timeout: 30
    metrics:
      enable: true
      port: 10254
      portName: metrics
      endpoint: /metrics
      prefix: ""  
    tolerations:
      - key: "CriticalAddonsOnly"
        operator: "Exists"
        effect: "NoSchedule"

It works when I do manually through the terminal however when I execute from airflow I get this error from server for: "STDIN": sparkapplications.sparkoperator.k8s.io "pyspark-pi2" is forbidden: User "system:serviceaccount:team-1:spark-sa" cannot get resource "sparkapplications" in API group "sparkoperator.k8s.io" in the namespace "team-1"

here is the task in airflow

spark_kpo = KubernetesPodOperator(
        task_id="kpo",
        name="spark-app-submission",
        namespace=namespace,
        image="bitnami/kubectl:1.28.11",
        cmds=["/bin/bash", "-c"],
        arguments=[f"echo '{spark_app_manifest_content}' | kubectl apply -f -"],
        in_cluster=True,
        get_logs=True,
        service_account_name=service_account_name,
        on_finish_action="keep_pod",
    )
    ```

devscheffer avatar Jul 29 '24 10:07 devscheffer

@devscheffer The service account spark-sa actually does not have any permissions for SparkApplication, and it is used by spark driver pods. If you want to submit SparkApplication in airflow, you can configure the service account name to spark-operator-sa in KubernetesPodOperator instead. Or you can create a ServiceAccount manually and grant it with all permissions to SparkApplication.

ChenYi015 avatar Jul 29 '24 11:07 ChenYi015

Hello. I'd like to say, that I do have the same result. I deployed helm v2.0.2 like so:

helm install spark-operator ./spark-operator \
    --version 2.0.2 \
    --create-namespace \
    --namespace spark-operator \
    --set 'spark.jobNamespaces={,airflow}' \
    --values ./values.yaml

With values.yaml for it was like:

nameOverride: ""
fullnameOverride: ""
commonLabels: {}

image:
  registry: docker.io
  repository: kubeflow/spark-operator
  tag: ""
  pullPolicy: IfNotPresent
  pullSecrets: []

controller:
  replicas: 1
  workers: 10
  logLevel: info
  uiService:
    enable: true
  uiIngress:
    enable: false
    urlFormat: ""
  batchScheduler:
    enable: true
    kubeSchedulerNames:
      - volcano
    default: ""
  serviceAccount:
    create: true
    name: ""
    annotations: {}
  rbac:
    create: true
    annotations: {}
  labels: {}
  annotations: {}
  volumes: []
  nodeSelector: {}
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - test-node
  tolerations:
    - key: "airflow"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"
  priorityClassName: ""
  podSecurityContext: {}
  topologySpreadConstraints: []
  env: []
  envFrom: []
  volumeMounts: []
  resources: {}
  securityContext: {}
  sidecars: []
  podDisruptionBudget:
    enable: false
    minAvailable: 1
  pprof:
    enable: false
    port: 6060
    portName: pprof
  workqueueRateLimiter:
    bucketQPS: 50
    bucketSize: 500
    maxDelay:
      enable: true
      duration: 6h

webhook:
  enable: true
  replicas: 1
  logLevel: info
  port: 9443
  portName: webhook
  failurePolicy: Fail
  timeoutSeconds: 10
  resourceQuotaEnforcement:
    enable: false
  serviceAccount:
    create: true
    name: ""
    annotations: {}
  rbac:
    create: true
    annotations: {}
  labels: {}
  annotations: {}
  sidecars: []
  volumes: []
  nodeSelector: {}
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - test-node
  tolerations:
    - key: "airflow"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"
  priorityClassName: ""
  podSecurityContext: {}
  topologySpreadConstraints: []
  env: []
  envFrom: []
  volumeMounts: []
  resources: {}
  securityContext: {}
  podDisruptionBudget:
    enable: false
    minAvailable: 1

spark:
  jobNamespaces:
  - "airflow"
  serviceAccount:
    create: true
    name: ""
    annotations: {}
  rbac:
    create: true
    annotations: {}

prometheus:
  metrics:
    enable: true
    port: 8080
    portName: metrics
    endpoint: /metrics
    prefix: ""
  podMonitor:
    create: true
    labels: {}
    jobLabel: spark-operator-podmonitor
    podMetricsEndpoint:
      scheme: http
      interval: 5s

And right after that, if I run a DAG from Airflow, as a result I have a POD spark-submit which fails with the next error:

Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': '35324a3b-9f01-4c3b-bf56-445ea8746423', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '8bae74e0-9f4b-483f-8878-77b94fe77097', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'b1662841-0cf0-4ed4-8ade-b34262bca683', 'Date': 'Fri, 18 Oct 2024 08:05:50 GMT', 'Content-Length': '483'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"sparkapplications.sparkoperator.k8s.io \"spark-submit-soyzhqvo\" is forbidden: User \"system:serviceaccount:transgran-spreads:airflow-worker\" cannot get resource \"sparkapplications/status\" in API group \"sparkoperator.k8s.io\" in the namespace \"airflow\"","reason":"Forbidden","details":{"name":"spark-submit-soyzhqvo","group":"sparkoperator.k8s.io","kind":"sparkapplications"},"code":403}

This can be fixed by adding:

  • to airflow-pod-launcher-role (Role)
- apiGroups:
  - sparkoperator.k8s.io
  resources:
  - '*'
  verbs:
  - '*'
  • to spark-operator-spark (RoleBinding):
- kind: ServiceAccount
  name: default
  namespace: airflow

With all this above, I'd like to ask why this fixes wasn't added by the helm chart ?

alexz0nder avatar Oct 18 '24 17:10 alexz0nder

I think I have the same issue. I installed spark-operator v2.0.0 using following command

helm install vikas spark-operator/spark-operator --version v2.0.0 \
    --namespace spark-operator \
    --create-namespace --set "sparkJobNamespaces={testvikas}"\
    --set webhook.enable=true

namespace testvikas already exists but i didn't get a service account created for the same.

Below is the output of my kubectl get serviceaccounts command: Screenshot 2024-11-13 at 5 06 39 AM

I was expecting testvikas-spark-operator-spark to be shown in the output

vikas-saxena02 avatar Nov 12 '24 18:11 vikas-saxena02