spark-operator icon indicating copy to clipboard operation
spark-operator copied to clipboard

spec.driver.env/envFrom is not working even when webhook is enabled

Open shinen opened this issue 5 years ago • 12 comments

I deployed a spark operator with manifest/spark-operator-with-webhook.yaml, with -enable-webhook=true. However using env or envFrom doesn't inject the environment variables to driver and executor.

What's the problem?

Following is how I define env in my spark application yml. I have tested deploying each of them separately.

Only envVars(which will be deprecated) works.

driver:
  env:   
    - name: ENV_TWO
      value: hello
    - name: ENV_TWO
         valueFrom:
           secretKeyRef:
             name: secretenv
             key: TEST
  envFrom:
    - secretRef:
        name: secretenv
  envVars:
    TEST_ENVVARS: test

shinen avatar Dec 09 '20 06:12 shinen

env is not working for me as well. The spark application config shows it: driver: env:

  • name: ENV1 value: VAL1

But the env variables are not created inside the pods.

sakshi-bansal avatar Dec 14 '20 09:12 sakshi-bansal

Anyone can help?

shinen avatar Jan 05 '21 06:01 shinen

I faced the same problem, any solution?

nooshin-mirzadeh avatar Jan 12 '21 12:01 nooshin-mirzadeh

hi,all I have not encountered this kind of problem, but I am happy to help you troubleshoot what happened。

create secret env kubectl create -f spark-secret-env.yaml,the content is as follows:

apiVersion: v1
kind: Secret
metadata:
  name: spark-secret-env
  namespace: szww
type: Opaque
data:
  password: cGFzc3dvcmQK
  username: YWRtaW4=

Now,i create sparkapplication with the following yaml

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi-test-env
  namespace: szww
spec:
  type: Scala
  mode: cluster
  image: "gcr.io/spark-operator/spark:v3.0.0"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0.jar"
  sparkVersion: "3.0.0"
  arguments:
    - "10000"
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    env:
    - name: "ENV1"
      value: "VAL1"
    - name: "USER"
      valueFrom:
        secretKeyRef:
          name: spark-secret-env
          key: username
    envFrom:
    - secretRef:
        name: spark-secret-env
    labels:
      version: 3.0.0
    serviceAccount: spark
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    env:
    - name: "ENV1"
      value: "VAL1"
    - name: "USER"
      valueFrom:
        secretKeyRef:
          name: spark-secret-env
          key: username
    envFrom:
      - secretRef:
          name: spark-secret-env
    labels:
      version: 3.0.0
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

Then I run the following command to enter the container, in order to see if the environment variables were injected.

# kubectl -n szww exec -it spark-pi-test-env-driver bash

I entered the container and was able to get the env

185@spark-pi-test-env-driver:~/work-dir$ echo $ENV1
VAL1
185@spark-pi-test-env-driver:~/work-dir$ echo $USER
admin
185@spark-pi-test-env-driver:~/work-dir$ echo $username
admin
185@spark-pi-test-env-driver:~/work-dir$ echo $password
password

kz33 avatar Jan 26 '21 10:01 kz33

@kz33 Tried your approach and still enviroment variables are not making their way into their containers.

@nooshin-mirzadeh @shinen @sakshi-bansal

Is this still a valid issue with you?

I tested on my end and even if webhook is enabled, the environment variables is not working

vvavepacket avatar Feb 06 '21 16:02 vvavepacket

This is now working.. Just make sure you update your latest helm chart.

vvavepacket avatar Feb 06 '21 17:02 vvavepacket

Hm, I am still hitting this on the latest helm chart.

mtaron avatar Feb 10 '21 04:02 mtaron

I am having the same problem with chart version 1.0.7, running on EKS 1.18. As workaround I set it directly to Sparkconfig

spec:
  sparkConf:
    "spark.kubernetes.driverEnv.[EnvironmentVariableName]": "value"

5RK7N avatar Mar 02 '21 17:03 5RK7N

env , and envFrom were not working with chart 1.0.7, operator v1beta2-1.2.0-3.0.0, kubernetes version v1.19.7

seems like issue with with kubernetes version v1.19 which is built using Go 1.15

installed

helm upgrade --install --version 1.0.7 sparkoperator spark-operator/spark-operator \
		--namespace spark-operator --set sparkJobNamespace=spark-apps,webhook.enable=true \
		--set image.tag=v1beta2-1.2.0-3.0.0

from api server logs v1.19.7

W0323 17:54:05.005825       1 dispatcher.go:170] Failed calling webhook, failing open webhook.sparkoperator.k8s.io: failed calling webhook "webhook.sparkoperator.k8s.io": Post "https://sparkoperator-spark-operator-webhook.spark-operator.svc:443/webhook?timeout=30s": x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0
E0323 17:54:05.005857       1 dispatcher.go:171] failed calling webhook "webhook.sparkoperator.k8s.io": Post "https://sparkoperator-spark-operator-webhook.spark-operator.svc:443/webhook?timeout=30s": x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0
I0323 17:54:08.140788       1 client.go:360] parsed scheme: "passthrough"
I0323 17:54:08.140836       1 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{https://10.157.149.99:2379  <nil> 0 <nil>}] <nil> <nil>}
I0323 17:54:08.140846       1 clientconn.go:948] ClientConn switching balancer to "pick_first"
W0323 17:54:13.251479       1 dispatcher.go:170] Failed calling webhook, failing open webhook.sparkoperator.k8s.io: failed calling webhook "webhook.sparkoperator.k8s.io": Post "https://sparkoperator-spark-operator-webhook.spark-operator.svc:443/webhook?timeout=30s": x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0
E0323 17:54:13.251512       1 dispatcher.go:171] failed calling webhook "webhook.sparkoperator.k8s.io": Post "https://sparkoperator-spark-operator-webhook.spark-operator.svc:443/webhook?timeout=30s": x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0

operator v1beta2-1.2.2-3.0.0 seems to have fixed above error with kubernetes version v1.19.7 through https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/pull/1027

chaudhryfaisal avatar Mar 20 '21 19:03 chaudhryfaisal

eks 1.18 also not working

debu99 avatar Apr 14 '21 17:04 debu99

I spent many hours trying to troubleshoot this issue today. I posted my findings here: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/1229#issuecomment-827896078

TLDR; While I could not get the env or envFrom methods to work, I was able to get unblocked (for now) using envSecretKeyRefs.

swill avatar Apr 27 '21 20:04 swill

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Oct 14 '24 04:10 github-actions[bot]

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

github-actions[bot] avatar Nov 03 '24 06:11 github-actions[bot]