spark-operator
                                
                                
                                
                                    spark-operator copied to clipboard
                            
                            
                            
                        Add kerberos changes for secure hadoop access
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
:memo: Please visit https://cla.developers.google.com/ to sign.
Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.
What to do if you already signed the CLA
Individual signers
- It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.
 
Corporate signers
- Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
 - The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
 - The email used to register you as an authorized contributor must also be attached to your GitHub account.
 
ℹ️ Googlers: Go here for more info.
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
:memo: Please visit https://cla.developers.google.com/ to sign.
Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.
What to do if you already signed the CLA
Individual signers
- It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.
 
Corporate signers
- Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
 - The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
 - The email used to register you as an authorized contributor must also be attached to your GitHub account.
 
ℹ️ Googlers: Go here for more info.
@googlebot I signed it!
@liyinan926 We have merged the kerberos changes for accessing secure hadoop in to latest version of spark operator at Comcast and using it currently. @CsatariGergely @breetasinha1109 Could you please verify if the changes are good.
@CsatariGergely @breetasinha1109 Could you please verify if the changes are good.
@chrevanthreddy Sure, We'll review and get back with comments if any. Thanks!
Hi @chrevanthreddy,
The changes looks to be fine. Given one comment in submission.go
Also, it will be better if we add kerberos related documentation in User Guide. You can refer to the documentation added in the same PR - https://github.com/nokia/spark-on-k8s-operator/pull/7/files (docs/user-guide.md)
Thanks and Regards, Breeta
@chrevanthreddy Is there any example for the spark program with this feature?Thanks!
Hi @breetasinha1109 , I saw that this PR(https://github.com/nokia/spark-on-k8s-operator/pull/7/files) supports Kerberos in spark 2.4.5, is that right?
Then I tried to use these kerberos-related configuration parameters(spark.kerberos.principal,spark.kerberos.keytab), but they did not take effect in spark 2.4.5.
I want to know what I did wrong? Or you have done other optimizations in spark, if you can, can you tell me?
BTW, according to this PR, I have been able to connect to Kerberos HDFS, but it is in the image of spark3.0. If I want to implement it in spark 2.4+, what should I do? Thanks!
Hi @kz33 ,
In Spark 2.x, kerberos is not supported. We have done some internal code changes in Spark to support Kerberos in Spark 2.x. While in Spark 3.x, kerberos is supported by default, so this PR is compliant with Spark 3.x
Thanks!
Hi @joanjiao2016 ,
You can refer to below simple examples -
Example 1 -
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-kerberos
  namespace: default
spec:
  type: Python
  pythonVersion: "2"
  mode: cluster
  image: "gcr.io/spark-operator/spark-py:v3.0.0"
  imagePullPolicy: Always
  mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py
  sparkVersion: "3.0.0"
  restartPolicy:
    type: Never
  hadoopConfigMap: "<HADOOP_CONFIG_MAP_NAME>"
  kerberos:
    enabled: true
	kerberosPrincipal: "<PRINCIPAL>"
	keytabSecret: "<KEYTAB_SECRET>"
	keytabName: "<KEYTAB_FILE_NAME>"
	krb5ConfigMap: "<KRB5_CONFIG_MAP_NAME>"
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.0.0
    serviceAccount: spark
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.0.0
Example 2 -
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-kerberos
  namespace: default
spec:
  type: Python
  pythonVersion: "2"
  mode: cluster
  image: "gcr.io/spark-operator/spark-py:v3.0.0"
  imagePullPolicy: Always
  mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py
  sparkVersion: "3.0.0"
  restartPolicy:
    type: Never
  sparkConf:
    "spark.kubernetes.kerberos.krb5.configMapName" : "<KRB5_CONFIG_MAP_NAME>"
    "spark.kubernetes.hadoop.configMapName" : "<HADOOP_CONFIG_MAP_NAME>"
  kerberos:
    enabled: true
	kerberosPrincipal: "<PRINCIPAL>"
	keytabSecret: "<KEYTAB_SECRET>"
	keytabName: "<KEYTAB_FILE_NAME>"
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.0.0
    serviceAccount: spark
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.0.0
Hi @chrevanthreddy , Its better if we include spark-kerberos examples under examples/.
Thanks and Regards, Breeta
Hi @joanjiao2016 ,
You can refer to below simple examples -
Example 1 -
apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: name: spark-kerberos namespace: default spec: type: Python pythonVersion: "2" mode: cluster image: "gcr.io/spark-operator/spark-py:v3.0.0" imagePullPolicy: Always mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py sparkVersion: "3.0.0" restartPolicy: type: Never hadoopConfigMap: "<HADOOP_CONFIG_MAP_NAME>" kerberos: enabled: true kerberosPrincipal: "<PRINCIPAL>" keytabSecret: "<KEYTAB_SECRET>" keytabName: "<KEYTAB_FILE_NAME>" krb5ConfigMap: "<KRB5_CONFIG_MAP_NAME>" driver: cores: 1 coreLimit: "1200m" memory: "512m" labels: version: 3.0.0 serviceAccount: spark executor: cores: 1 instances: 1 memory: "512m" labels: version: 3.0.0Example 2 -
apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: name: spark-kerberos namespace: default spec: type: Python pythonVersion: "2" mode: cluster image: "gcr.io/spark-operator/spark-py:v3.0.0" imagePullPolicy: Always mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py sparkVersion: "3.0.0" restartPolicy: type: Never sparkConf: "spark.kubernetes.kerberos.krb5.configMapName" : "<KRB5_CONFIG_MAP_NAME>" "spark.kubernetes.hadoop.configMapName" : "<HADOOP_CONFIG_MAP_NAME>" kerberos: enabled: true kerberosPrincipal: "<PRINCIPAL>" keytabSecret: "<KEYTAB_SECRET>" keytabName: "<KEYTAB_FILE_NAME>" driver: cores: 1 coreLimit: "1200m" memory: "512m" labels: version: 3.0.0 serviceAccount: spark executor: cores: 1 instances: 1 memory: "512m" labels: version: 3.0.0Hi @chrevanthreddy , Its better if we include spark-kerberos examples under examples/.
Thanks and Regards, Breeta
I want to access a hbase table with kerberos authentication, can you give me any example ? Thanks a lot!
@joanjiao2016 Please make sure hbase-site.xml is in hadoop configs or spark configs and Spark will handle the authentication to Hbase table too
Any reason this PR was never merged?
@joanjiao2016 Can Kerbores authentication now be supported in Spark 2.x?
Why this PR is not meged?