spark-operator Add kerberos changes for secure hadoop access

Oct 21 '20 18:10 chrevanthreddy

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

ℹ️ Googlers: Go here for more info.

Oct 21 '20 18:10 google-cla[bot]

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

ℹ️ Googlers: Go here for more info.

Oct 21 '20 18:10 google-cla[bot]

@googlebot I signed it!

Oct 21 '20 18:10 chrevanthreddy

@liyinan926 We have merged the kerberos changes for accessing secure hadoop in to latest version of spark operator at Comcast and using it currently. @CsatariGergely @breetasinha1109 Could you please verify if the changes are good.

Oct 21 '20 18:10 chrevanthreddy

@CsatariGergely @breetasinha1109 Could you please verify if the changes are good.

@chrevanthreddy Sure, We'll review and get back with comments if any. Thanks!

Oct 22 '20 10:10 breetasinha1109

Hi @chrevanthreddy,

The changes looks to be fine. Given one comment in submission.go

Also, it will be better if we add kerberos related documentation in User Guide. You can refer to the documentation added in the same PR - https://github.com/nokia/spark-on-k8s-operator/pull/7/files (docs/user-guide.md)

Thanks and Regards, Breeta

Nov 04 '20 04:11 breetasinha1109

@chrevanthreddy Is there any example for the spark program with this feature？Thanks！

Nov 18 '20 07:11 joanjiao2016

Hi @breetasinha1109 , I saw that this PR(https://github.com/nokia/spark-on-k8s-operator/pull/7/files) supports Kerberos in spark 2.4.5, is that right?

Then I tried to use these kerberos-related configuration parameters(spark.kerberos.principal,spark.kerberos.keytab), but they did not take effect in spark 2.4.5.

I want to know what I did wrong? Or you have done other optimizations in spark, if you can, can you tell me?

BTW, according to this PR, I have been able to connect to Kerberos HDFS, but it is in the image of spark3.0. If I want to implement it in spark 2.4+, what should I do? Thanks!

Nov 18 '20 11:11 kz33

Hi @kz33 ,

In Spark 2.x, kerberos is not supported. We have done some internal code changes in Spark to support Kerberos in Spark 2.x. While in Spark 3.x, kerberos is supported by default, so this PR is compliant with Spark 3.x

Thanks!

Nov 23 '20 04:11 breetasinha1109

Hi @joanjiao2016 ,

You can refer to below simple examples -

Example 1 -

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-kerberos
  namespace: default
spec:
  type: Python
  pythonVersion: "2"
  mode: cluster
  image: "gcr.io/spark-operator/spark-py:v3.0.0"
  imagePullPolicy: Always
  mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py
  sparkVersion: "3.0.0"
  restartPolicy:
    type: Never
  hadoopConfigMap: "<HADOOP_CONFIG_MAP_NAME>"
  kerberos:
    enabled: true
	kerberosPrincipal: "<PRINCIPAL>"
	keytabSecret: "<KEYTAB_SECRET>"
	keytabName: "<KEYTAB_FILE_NAME>"
	krb5ConfigMap: "<KRB5_CONFIG_MAP_NAME>"
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.0.0
    serviceAccount: spark
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.0.0

Example 2 -

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-kerberos
  namespace: default
spec:
  type: Python
  pythonVersion: "2"
  mode: cluster
  image: "gcr.io/spark-operator/spark-py:v3.0.0"
  imagePullPolicy: Always
  mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py
  sparkVersion: "3.0.0"
  restartPolicy:
    type: Never
  sparkConf:
    "spark.kubernetes.kerberos.krb5.configMapName" : "<KRB5_CONFIG_MAP_NAME>"
    "spark.kubernetes.hadoop.configMapName" : "<HADOOP_CONFIG_MAP_NAME>"
  kerberos:
    enabled: true
	kerberosPrincipal: "<PRINCIPAL>"
	keytabSecret: "<KEYTAB_SECRET>"
	keytabName: "<KEYTAB_FILE_NAME>"
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.0.0
    serviceAccount: spark
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.0.0

Hi @chrevanthreddy , Its better if we include spark-kerberos examples under examples/.

Thanks and Regards, Breeta

Nov 23 '20 04:11 breetasinha1109

Hi @joanjiao2016 ,

You can refer to below simple examples -

Example 1 -

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-kerberos
  namespace: default
spec:
  type: Python
  pythonVersion: "2"
  mode: cluster
  image: "gcr.io/spark-operator/spark-py:v3.0.0"
  imagePullPolicy: Always
  mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py
  sparkVersion: "3.0.0"
  restartPolicy:
    type: Never
  hadoopConfigMap: "<HADOOP_CONFIG_MAP_NAME>"
  kerberos:
    enabled: true
	kerberosPrincipal: "<PRINCIPAL>"
	keytabSecret: "<KEYTAB_SECRET>"
	keytabName: "<KEYTAB_FILE_NAME>"
	krb5ConfigMap: "<KRB5_CONFIG_MAP_NAME>"
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.0.0
    serviceAccount: spark
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.0.0

Example 2 -

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-kerberos
  namespace: default
spec:
  type: Python
  pythonVersion: "2"
  mode: cluster
  image: "gcr.io/spark-operator/spark-py:v3.0.0"
  imagePullPolicy: Always
  mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py
  sparkVersion: "3.0.0"
  restartPolicy:
    type: Never
  sparkConf:
    "spark.kubernetes.kerberos.krb5.configMapName" : "<KRB5_CONFIG_MAP_NAME>"
    "spark.kubernetes.hadoop.configMapName" : "<HADOOP_CONFIG_MAP_NAME>"
  kerberos:
    enabled: true
	kerberosPrincipal: "<PRINCIPAL>"
	keytabSecret: "<KEYTAB_SECRET>"
	keytabName: "<KEYTAB_FILE_NAME>"
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.0.0
    serviceAccount: spark
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.0.0

Hi @chrevanthreddy , Its better if we include spark-kerberos examples under examples/.

Thanks and Regards, Breeta

I want to access a hbase table with kerberos authentication, can you give me any example ? Thanks a lot!

Jun 17 '21 10:06 joanjiao2016

@joanjiao2016 Please make sure hbase-site.xml is in hadoop configs or spark configs and Spark will handle the authentication to Hbase table too

Jun 23 '21 15:06 chrevanthreddy

Any reason this PR was never merged?

Mar 14 '23 16:03 neggert

@joanjiao2016 Can Kerbores authentication now be supported in Spark 2.x?

May 06 '23 08:05 jiangjian0920

Why this PR is not meged?

Aug 23 '23 09:08 stephen-do

spark-operator spark-operator copied to clipboard

Add kerberos changes for secure hadoop access

What to do if you already signed the CLA

Individual signers

Corporate signers

What to do if you already signed the CLA

Individual signers

Corporate signers

spark-operator
spark-operator copied to clipboard