spark-operator icon indicating copy to clipboard operation
spark-operator copied to clipboard

Add kerberos changes for secure hadoop access

Open chrevanthreddy opened this issue 5 years ago • 16 comments

chrevanthreddy avatar Oct 21 '20 18:10 chrevanthreddy

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

google-cla[bot] avatar Oct 21 '20 18:10 google-cla[bot]

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

google-cla[bot] avatar Oct 21 '20 18:10 google-cla[bot]

@googlebot I signed it!

chrevanthreddy avatar Oct 21 '20 18:10 chrevanthreddy

@liyinan926 We have merged the kerberos changes for accessing secure hadoop in to latest version of spark operator at Comcast and using it currently. @CsatariGergely @breetasinha1109 Could you please verify if the changes are good.

chrevanthreddy avatar Oct 21 '20 18:10 chrevanthreddy

@CsatariGergely @breetasinha1109 Could you please verify if the changes are good.

@chrevanthreddy Sure, We'll review and get back with comments if any. Thanks!

breetasinha1109 avatar Oct 22 '20 10:10 breetasinha1109

Hi @chrevanthreddy,

The changes looks to be fine. Given one comment in submission.go

Also, it will be better if we add kerberos related documentation in User Guide. You can refer to the documentation added in the same PR - https://github.com/nokia/spark-on-k8s-operator/pull/7/files (docs/user-guide.md)

Thanks and Regards, Breeta

breetasinha1109 avatar Nov 04 '20 04:11 breetasinha1109

@chrevanthreddy Is there any example for the spark program with this feature?Thanks!

joanjiao2016 avatar Nov 18 '20 07:11 joanjiao2016

Hi @breetasinha1109 , I saw that this PR(https://github.com/nokia/spark-on-k8s-operator/pull/7/files) supports Kerberos in spark 2.4.5, is that right?

Then I tried to use these kerberos-related configuration parameters(spark.kerberos.principal,spark.kerberos.keytab), but they did not take effect in spark 2.4.5.

I want to know what I did wrong? Or you have done other optimizations in spark, if you can, can you tell me?

BTW, according to this PR, I have been able to connect to Kerberos HDFS, but it is in the image of spark3.0. If I want to implement it in spark 2.4+, what should I do? Thanks!

kz33 avatar Nov 18 '20 11:11 kz33

Hi @kz33 ,

In Spark 2.x, kerberos is not supported. We have done some internal code changes in Spark to support Kerberos in Spark 2.x. While in Spark 3.x, kerberos is supported by default, so this PR is compliant with Spark 3.x

Thanks!

breetasinha1109 avatar Nov 23 '20 04:11 breetasinha1109

Hi @joanjiao2016 ,

You can refer to below simple examples -

Example 1 -

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-kerberos
  namespace: default
spec:
  type: Python
  pythonVersion: "2"
  mode: cluster
  image: "gcr.io/spark-operator/spark-py:v3.0.0"
  imagePullPolicy: Always
  mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py
  sparkVersion: "3.0.0"
  restartPolicy:
    type: Never
  hadoopConfigMap: "<HADOOP_CONFIG_MAP_NAME>"
  kerberos:
    enabled: true
	kerberosPrincipal: "<PRINCIPAL>"
	keytabSecret: "<KEYTAB_SECRET>"
	keytabName: "<KEYTAB_FILE_NAME>"
	krb5ConfigMap: "<KRB5_CONFIG_MAP_NAME>"
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.0.0
    serviceAccount: spark
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.0.0

Example 2 -

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-kerberos
  namespace: default
spec:
  type: Python
  pythonVersion: "2"
  mode: cluster
  image: "gcr.io/spark-operator/spark-py:v3.0.0"
  imagePullPolicy: Always
  mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py
  sparkVersion: "3.0.0"
  restartPolicy:
    type: Never
  sparkConf:
    "spark.kubernetes.kerberos.krb5.configMapName" : "<KRB5_CONFIG_MAP_NAME>"
    "spark.kubernetes.hadoop.configMapName" : "<HADOOP_CONFIG_MAP_NAME>"
  kerberos:
    enabled: true
	kerberosPrincipal: "<PRINCIPAL>"
	keytabSecret: "<KEYTAB_SECRET>"
	keytabName: "<KEYTAB_FILE_NAME>"
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.0.0
    serviceAccount: spark
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.0.0

Hi @chrevanthreddy , Its better if we include spark-kerberos examples under examples/.

Thanks and Regards, Breeta

breetasinha1109 avatar Nov 23 '20 04:11 breetasinha1109

Hi @joanjiao2016 ,

You can refer to below simple examples -

Example 1 -

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-kerberos
  namespace: default
spec:
  type: Python
  pythonVersion: "2"
  mode: cluster
  image: "gcr.io/spark-operator/spark-py:v3.0.0"
  imagePullPolicy: Always
  mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py
  sparkVersion: "3.0.0"
  restartPolicy:
    type: Never
  hadoopConfigMap: "<HADOOP_CONFIG_MAP_NAME>"
  kerberos:
    enabled: true
	kerberosPrincipal: "<PRINCIPAL>"
	keytabSecret: "<KEYTAB_SECRET>"
	keytabName: "<KEYTAB_FILE_NAME>"
	krb5ConfigMap: "<KRB5_CONFIG_MAP_NAME>"
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.0.0
    serviceAccount: spark
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.0.0

Example 2 -

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-kerberos
  namespace: default
spec:
  type: Python
  pythonVersion: "2"
  mode: cluster
  image: "gcr.io/spark-operator/spark-py:v3.0.0"
  imagePullPolicy: Always
  mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py
  sparkVersion: "3.0.0"
  restartPolicy:
    type: Never
  sparkConf:
    "spark.kubernetes.kerberos.krb5.configMapName" : "<KRB5_CONFIG_MAP_NAME>"
    "spark.kubernetes.hadoop.configMapName" : "<HADOOP_CONFIG_MAP_NAME>"
  kerberos:
    enabled: true
	kerberosPrincipal: "<PRINCIPAL>"
	keytabSecret: "<KEYTAB_SECRET>"
	keytabName: "<KEYTAB_FILE_NAME>"
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.0.0
    serviceAccount: spark
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.0.0

Hi @chrevanthreddy , Its better if we include spark-kerberos examples under examples/.

Thanks and Regards, Breeta

I want to access a hbase table with kerberos authentication, can you give me any example ? Thanks a lot!

joanjiao2016 avatar Jun 17 '21 10:06 joanjiao2016

@joanjiao2016 Please make sure hbase-site.xml is in hadoop configs or spark configs and Spark will handle the authentication to Hbase table too

chrevanthreddy avatar Jun 23 '21 15:06 chrevanthreddy

Any reason this PR was never merged?

neggert avatar Mar 14 '23 16:03 neggert

@joanjiao2016 Can Kerbores authentication now be supported in Spark 2.x?

jiangjian0920 avatar May 06 '23 08:05 jiangjian0920

Why this PR is not meged?

stephen-do avatar Aug 23 '23 09:08 stephen-do