NOTE: this PR is deprecated and kept for discussions history only. Please refer the #249 to get the latest state of the work.

What changes were proposed in this pull request?

This PR is a new feature proposal: full support for Spark on Kubernetes (inspired by SparkYarnApp implementation).

Since Spark on Kubernetes has been released relatively long ago this can be a good idea to include Kubernetes support to Livy project as well, as it can solve much problems related to working with Spark on Kubernetes, it can fully replace Yarn in case of working atop Kubernetes cluster:

Livy UI has cached logs/diagnostics page
Livy UI shows links to Spark UI and Spark History Server
With Kubernetes Ingress resource Livy can be configured to serve as an orchestrator of Spark Apps atop Kubernetes (PR includes Nginx Ingress support option to create routes to Spark UI)
Nginx Ingress solves basePath support for Spark UI and History Server as well as has lots of auth integrations available: https://github.com/kubernetes/ingress-nginx
Livy UI can be integrated with Grafana Loki logs (PR provides solution for that)

Dockerfiles repo: https://github.com/jahstreet/spark-on-kubernetes-docker Helm charts: https://github.com/jahstreet/spark-on-kubernetes-helm

Associated JIRA: https://issues.apache.org/jira/browse/LIVY-588

Design concept: https://github.com/jahstreet/spark-on-kubernetes-helm/blob/develop/README.md

How was this patch tested?

Was tested manually on AKS cluster (Azure Kubernetes Services), Kubernetes v1.11.8:

Image: Spark 2.4.3 with Hadoop 3.2.0 (https://github.com/jahstreet/spark-on-kubernetes-docker)
History Server: https://github.com/helm/charts/tree/master/stable/spark-history-server
Jupyter Notebook with Sparkmagic: https://github.com/jahstreet/spark-on-kubernetes-helm/tree/master/charts/jupyter

What do you think on that?

Apr 09 '19 16:04 jahstreet

@vanzin please take a look.

Apr 10 '19 11:04 jahstreet

Just to set expectations, it's very unlikely I'll be able to look at this PR (or any other really) any time soon.

Apr 10 '19 16:04 vanzin

Just to set expectations, it's very unlikely I'll be able to look at this PR (or any other really) any time soon.

Well, then I'll try to prepare as much as I can till you become available. Hope anyone from community will be able to share the feedback on the work done.

Apr 10 '19 17:04 jahstreet

Codecov Report

Merging #167 into master will decrease coverage by 3.47%. The diff coverage is 26.71%.

@@             Coverage Diff              @@
##             master     #167      +/-   ##
============================================
- Coverage      68.6%   65.12%   -3.48%     
- Complexity      904      940      +36     
============================================
  Files           100      102       +2     
  Lines          5666     6291     +625     
  Branches        850      946      +96     
============================================
+ Hits           3887     4097     +210     
- Misses         1225     1614     +389     
- Partials        554      580      +26

Impacted Files	Coverage Δ	Complexity Δ
...e/livy/server/interactive/InteractiveSession.scala	`68.75% <0%> (-0.37%)`	`46 <0> (+2)`
...rver/src/main/scala/org/apache/livy/LivyConf.scala	`96.46% <100%> (+0.6%)`	`22 <1> (+1)`	:arrow_up:
...ala/org/apache/livy/utils/SparkKubernetesApp.scala	`20.36% <20.36%> (ø)`	`0 <0> (?)`
...main/scala/org/apache/livy/server/LivyServer.scala	`32.43% <33.33%> (-3.53%)`	`11 <0> (ø)`
...ain/java/org/apache/livy/rsc/driver/RSCDriver.java	`79.25% <50%> (+1.28%)`	`45 <0> (+4)`	:arrow_up:
...rc/main/scala/org/apache/livy/utils/SparkApp.scala	`67.5% <55.55%> (-8.5%)`	`1 <0> (ø)`
...in/scala/org/apache/livy/repl/SQLInterpreter.scala	`62.5% <0%> (-7.88%)`	`9% <0%> (+2%)`
...ain/scala/org/apache/livy/utils/SparkYarnApp.scala	`66.01% <0%> (-7.23%)`	`40% <0%> (+7%)`
...n/scala/org/apache/livy/server/AccessManager.scala	`75.47% <0%> (-5.38%)`	`46% <0%> (+2%)`
...cala/org/apache/livy/scalaapi/ScalaJobHandle.scala	`52.94% <0%> (-2.95%)`	`7% <0%> (ø)`
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7dee3cc...7f6ef8a. Read the comment docs.

Apr 10 '19 19:04 codecov-io

I'm going to experiment with this a bit: We're running Spark on Kubernetes widely and we are seeking for also migrating our notebook usage on top of Kubernetes. The benefits we are seeing from Kubernetes is the elasticity with the associated cost savings, and the ability to track and analyse the resource usage of individual jobs closely.

From my quick glance on the source I will probably be missing more extensive support for customizing the created drivers (I assume that Livy creates the drivers as pods to the cluster, which then creates the executors). In our usage now with Spark on Kubernetes we supply about 20 different --conf options to the driver, from which some carry job specific information such as name and owner.

Apr 17 '19 18:04 garo

I'm going to experiment with this a bit: We're running Spark on Kubernetes widely and we are seeking for also migrating our notebook usage on top of Kubernetes. The benefits we are seeing from Kubernetes is the elasticity with the associated cost savings, and the ability to track and analyse the resource usage of individual jobs closely.

From my quick glance on the source I will probably be missing more extensive support for customizing the created drivers (I assume that Livy creates the drivers as pods to the cluster, which then creates the executors). In our usage now with Spark on Kubernetes we supply about 20 different --conf options to the driver, from which some carry job specific information such as name and owner.

Sounds cool, will be glad to assist you during the experiments. Maybe you can share with me the cases you are looking the solution for and I'm sure this would be helpful for designing the requirements to the features to implement within this work.

By the way in the near future I'll prepare the guidelines for deployment, customization and usage options of Livy on Kubernetes. Will share the progress on that.

Apr 17 '19 19:04 jahstreet

I built Livy on my own machine based on your branch and the Dockerfile in your repository. I got it running so that it created the driver pod, but I was unable to fully start the driver due to using my own spark image, which requires some configuration parameters to be passed in.

Here's some feedback:

Helm chart doesn't allow to specify "serviceAccount" property for Livy.
Couldn't find a way to set namespace which Livy must use. It seems to try to want to search all pods in all namespaces. Also need to set the namespace where the pods are created (Seems to be fixed to "default")
Could you provide a way to fully customise the driver pod specification? I would want to set custom volumes and volume-mounts, environment variables, labels, sidecar containers and possibly even customise the command line arguments for the driver.
Also a way to provide custom spark configuration settings for the driver pod would be required.
Support for macros for both customising the driver pod and the extra spark configuration options. I would at least need the id of the livy session (eg. "livy-session-2-9SZP8Ijv") to be inserted to both the pod template and the spark configuration options.

Unfortunately I don't know Scala really well, so I couldn't really dig into the code easily to determine how this works, so I'm not unable to provide you with more detailed recommendations.

Apr 18 '19 08:04 garo

@garo Thanks for the review.

Here are some explanations on you questions:

First version of chart were done without RBAC support. I've just done with RBAC support solution for Livy chart and not yet merged it, you can refer feature branch https://github.com/jahstreet/spark-on-kubernetes-helm/blob/charts/livy/rbac-support/charts/livy/values.yaml: serviceAccount: //Specifies whether a service account should be created create: true //The name of the service account to use. //If not set and create is true, a name is generated using the fullname template name:
Livy searches for a Driver Pod in all namespaces (theoretically user may want to use any namespace to submit job to) for the first time to initialize KubernetesApplication object, then it uses that object (which contains field namespace) to get Spark Pods states and logs and looks for that information only within 1 target namespace (I've added comments to the lines where this logic is done).
By default Livy should submit Spark App to default namespace (if it is not done so, than I need to make a fix ;)) ). You can change that behavior by adding spark.kubernetes.namespace=<desired_namespaces> to /opt/spark/conf/spark-defaults.conf in Livy container. Livy entrypoint is done so that it can set spark-defaults configs with env variables, so you can set Livy container env LIVY_SPARK_KUBERNETES_NAMESPACE=<desired_namespaces> to change Spark Apps default namespace. In the new version of Livy chart I set it to .Release.Namespace. And of course you can pass it as additional conf on App submission within POST request to Livy: { ... "conf": { "spark.kubernetes.namespace":"<desired_namespaces>"}, ... } to overwrite defaults.
Please refer some customization explanations to Livy: https://github.com/jahstreet/spark-on-kubernetes-helm/tree/master/charts/livy#customizing-livy-server Following that approach you can set any config defaults to both Livy and Spark. If you need to overwrite some - do that on job submission in the POST request body.
To customize Driver Pod spec we need a custom build of Spark installed to Livy image (Livy just runs spark-submit). I refer to the official releases of Apache Spark and do not see available options for that at present (including adding sidecars). But it has configs to set volumes and volume-mounts, environment variables, labels (https://spark.apache.org/docs/latest/running-on-kubernetes.html#spark-properties), which we can set to default values as I described before.
Customize the command line arguments for the driver - you mean application args? You can pass them on job submission in the POST request body: https://livy.incubator.apache.org/docs/latest/rest-api.html Or you need custom spark-submit script options?
Why do you need the id of the Livy session and what kind of macros for both customizing the driver pod and the extra spark configuration options do you mean?

Could you provide the example of a Job you wanna run? I hope I will be able to show you the available solutions using that example.

Apr 18 '19 10:04 jahstreet

Thank you very much for the detailed response! I'm just leaving for my easter holiday so I am not going to be able to actually try again until after that.

I however created this gist showing how we create the spark drivers in our current workflow: We run Azkaban (like a glorified cron service) which runs our spark applications. Each application (ie. a scheduled cron execution) starts a spark driver pod into kubernetes. If you look at this gist https://gist.github.com/garo/90c6e69d2430ef7d93ca9f564ba86059 there is first a build of spark-submit configuration parameters following with the yaml for the driver pod.

So I naturally tried to think how I can use Livy to launch the same image with same kind of settings. I think that with your explanations I can implement most if not all of these settings except the run_id.

Lets continue this discussion after easter. Have a great week!

Apr 18 '19 11:04 garo

Just to clarify to be on the same page... When you send request to Livy, eg:

kubectl exec livy-pod -- curl -H 'Content-Type: application/json' -X POST
-d '{ "name": "spark-pi", "proxyUser": "livy_user", "numExecutors": 2, "conf": { "spark.kubernetes.container.image": "sasnouskikh/spark:2.4.1-hadoop_3.2.0", "spark.kubernetes.container.image.pullPolicy": "Always", "spark.kubernetes.namespace": "default" }, "file": "local:///opt/spark/examples/jars/spark-examples_2.11-2.4.1.jar", "className": "org.apache.spark.examples.SparkPi", "args": [ "1000000" ] }' "http://localhost:8998/batches"

Under the hood livy just runs spark-submit for you:

spark-submit
--master k8s://https://<k8s_api_server>:443
--deploy-mode cluster
--name spark-pi
--class org.apache.spark.examples.SparkPi
--conf spark.executor.instances=2
--conf spark.kubernetes.container.image=sasnouskikh/spark:2.4.1-hadoop_3.2.0
--conf spark.kubernetes.container.image.pullPolicy=Always
--conf spark.kubernetes.namespace=default
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.1.jar 100000

Starting from Spark 2.4.0, spark-submit in cluster-mode creates Driver Pod, which entrypoint runs spark-submit in client mode, just like you try to do in the gist. So I do not see why you may want to deploy customized Driver Pod in that particular case. Most of --conf may be moved to defaults and you will have pretty JSON.

Pushgateway sidecar may be deployed as a separate Pod, just configure prometheus sink with right pushgateway-address. All other configs for Driver Pod customization are already covered by docs for Spark on Kubernetes.

Good week for you!

Apr 18 '19 15:04 jahstreet

I'm getting the following error

19/04/23 16:26:04 INFO LineBufferedStream: 19/04/23 16:26:04 INFO Client: Deployed Spark application livy-session-0 into Kubernetes.
19/04/23 16:26:04 INFO LineBufferedStream: 19/04/23 16:26:04 INFO ShutdownHookManager: Shutdown hook called
19/04/23 16:26:04 INFO LineBufferedStream: 19/04/23 16:26:04 INFO ShutdownHookManager: Deleting directory /tmp/spark-62b7810e-667d-47e7-9940-72f8cd5f91e9
19/04/23 16:26:04 DEBUG InteractiveSession: InteractiveSession 0 app state changed from RUNNING to FINISHED
19/04/23 16:26:04 DEBUG InteractiveSession: InteractiveSession 0 session state change from starting to dead
19/04/23 16:26:10 DEBUG AbstractByteBuf: -Dio.netty.buffer.bytebuf.checkAccessible: true
19/04/23 16:26:10 DEBUG ResourceLeakDetector: -Dio.netty.leakDetection.level: simple
19/04/23 16:26:10 DEBUG ResourceLeakDetector: -Dio.netty.leakDetection.maxRecords: 4
19/04/23 16:26:10 DEBUG Recycler: -Dio.netty.recycler.maxCapacity.default: 262144
19/04/23 16:26:10 DEBUG Recycler: -Dio.netty.recycler.linkCapacity: 16
19/04/23 16:26:10 DEBUG KryoMessageCodec: Decoded message of type org.apache.livy.rsc.rpc.Rpc$SaslMessage (41 bytes)
19/04/23 16:26:10 DEBUG RpcServer$SaslServerHandler: Handling SASL challenge message...
19/04/23 16:26:10 DEBUG RpcServer$SaslServerHandler: Sending SASL challenge response...
19/04/23 16:26:10 DEBUG KryoMessageCodec: Encoded message of type org.apache.livy.rsc.rpc.Rpc$SaslMessage (98 bytes)
19/04/23 16:26:10 DEBUG KryoMessageCodec: Decoded message of type org.apache.livy.rsc.rpc.Rpc$SaslMessage (275 bytes)
19/04/23 16:26:10 DEBUG RpcServer$SaslServerHandler: Handling SASL challenge message...
19/04/23 16:26:10 DEBUG RpcServer$SaslServerHandler: Sending SASL challenge response...
19/04/23 16:26:10 DEBUG KryoMessageCodec: Encoded message of type org.apache.livy.rsc.rpc.Rpc$SaslMessage (45 bytes)
19/04/23 16:26:10 DEBUG RpcServer$SaslServerHandler: SASL negotiation finished with QOP auth.
19/04/23 16:26:10 DEBUG ContextLauncher: New RPC client connected from [id: 0x2ae2b51a, L:/10.233.94.163:10000 - R:/10.233.94.164:39008].
19/04/23 16:26:10 DEBUG KryoMessageCodec: Decoded message of type org.apache.livy.rsc.rpc.Rpc$MessageHeader (5 bytes)
19/04/23 16:26:10 DEBUG KryoMessageCodec: Decoded message of type org.apache.livy.rsc.BaseProtocol$RemoteDriverAddress (94 bytes)
19/04/23 16:26:10 DEBUG RpcDispatcher: [RegistrationHandler] Received RPC message: type=CALL id=0 payload=org.apache.livy.rsc.BaseProtocol$RemoteDriverAddress
19/04/23 16:26:10 DEBUG ContextLauncher: Received driver info for client [id: 0x2ae2b51a, L:/10.233.94.163:10000 - R:/10.233.94.164:39008]: livy-session-0-1556036763266-driver/10000.
19/04/23 16:26:10 DEBUG KryoMessageCodec: Encoded message of type org.apache.livy.rsc.rpc.Rpc$MessageHeader (5 bytes)
19/04/23 16:26:10 DEBUG KryoMessageCodec: Encoded message of type org.apache.livy.rsc.rpc.Rpc$NullMessage (2 bytes)
19/04/23 16:26:10 DEBUG RpcDispatcher: Channel [id: 0x2ae2b51a, L:/10.233.94.163:10000 ! R:/10.233.94.164:39008] became inactive.
19/04/23 16:26:10 ERROR RSCClient: Failed to connect to context.
java.nio.channels.UnresolvedAddressException
	at sun.nio.ch.Net.checkAddress(Net.java:101)
	at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
	at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:209)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:207)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1206)
	at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:525)
	at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:510)
	at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:492)
	at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:949)
	at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:208)
	at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:167)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:394)
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
	at java.lang.Thread.run(Thread.java:748)
19/04/23 16:26:10 ERROR RSCClient: RPC error.
java.nio.channels.UnresolvedAddressException
	at sun.nio.ch.Net.checkAddress(Net.java:101)
	at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
	at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:209)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:207)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1206)
	at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:525)
	at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:510)
	at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:492)
	at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:949)
	at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:208)
	at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:167)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:394)
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
	at java.lang.Thread.run(Thread.java:748)
19/04/23 16:26:10 INFO RSCClient: Failing pending job d509417c-c894-416d-8218-625b278da8b7 due to shutdown.

Spark is running in different namespace that Livy. Service is also created just before this message appears so it does not seems to be error in ordering. Am I doing something wrong?

Apr 23 '19 16:04 ghost

@lukatera Good day,

From the first look I see that you are using either Livy build not from that PR (I've fixed the similar issue in that commit), or your Livy and/or Spark is configured not appropriately.

I require to know more about your environment to move further. Could you please provide something from the following in addition:

What is your Kubernetes installation and version?
What Docker images do you use? What version of Spark is running (this Livy was tested with Spark 2.4.0+, 2.3.* wasn't good enough and had some unpleasant bugs)? From what commit have you built Livy (if you did so)?
What are the livy.conf and livy-client.conf content (/opt/livy/conf/...)?
What are the Spark Job configs: kubectl describe configmap <spark-driver-pod-conf-map> -n <spark-job-namespace>? What is the JSON body you post to create a session?
What are the Spark Driver Pod logs?
If you use Helm charts - what are the versions and what are the custom values you provide on install?
Maybe some more debugging info you feel may be related?

Currently I run Livy build from this PR's branch with the provided Helm charts and Docker images both on Minikube for Windows and on Azure AKS without issues.

Will be happy to help, thanks for the feedback.

Apr 23 '19 20:04 jahstreet

@lukatera Good day,

From the first look I see that you are using either Livy build not from that PR (I've fixed the similar issue in that commit), or your Livy and/or Spark is configured not appropriately.

I require to know more about your environment to move further. Could you please provide something from the following in addition:

What is your Kubernetes installation and version?

What Docker images do you use? What version of Spark is running (this Livy was tested with Spark 2.4.0+, 2.3.* wasn't good enough and had some unpleasant bugs)? From what commit have you built Livy (if you did so)?

What are the livy.conf and livy-client.conf content (/opt/livy/conf/...)?

What are the Spark Job configs: kubectl describe configmap <spark-driver-pod-conf-map> -n <spark-job-namespace>? What is the JSON body you post to create a session?

What are the Spark Driver Pod logs?

If you use Helm charts - what are the versions and what are the custom values you provide on install?

Maybe some more debugging info you feel may be related?

Currently I run Livy build from this PR's branch with the provided Helm charts and Docker images both on Minikube for Windows and on Azure AKS without issues.

Will be happy to help, thanks for the feedback.

Thanks for the help! I was checking out master branch from your repo instead of this specific one. All good now!

Apr 24 '19 14:04 ghost

@lukatera Cool, nice to know that. Do not hesitate to ask if you face any problems on that.

Apr 24 '19 14:04 jahstreet

Great PR! one suggestion is maybe adding the authenticated livy user to both driver and executor pods labels. It should be simple enough since spark already supports arbitrary labels through submit command spark.kubernetes.driver.label.[LabelName].

May 13 '19 19:05 igorcalabria

@igorcalabria Thanks for the feedback, what mechanism of getting livy user value do you propose? I see an option of setting those labels with proxyUser value on Spark job submission from POST request to Livy. Did you mean that?

May 14 '19 06:05 jahstreet

@igorcalabria Thanks for the feedback, what mechanism of getting livy user value do you propose? I see an option of setting those labels with proxyUser value on Spark job submission from POST request to Livy. Did you mean that?

@jahstreet It could be that, but I was thinking about the authenticated user(via kerberos) making the requests. To give you more context, this could be great for resource usage tracking, especially if livy has more info available about the principal, like groups or even teams.

I'm not familiar with livy's codebase, but I'm guessing that the param we want is owner on the Session classes:

https://github.com/apache/incubator-livy/blob/master/server/src/main/scala/org/apache/livy/server/interactive/InteractiveSession.scala#L71
https://github.com/apache/incubator-livy/blob/master/server/src/main/scala/org/apache/livy/server/batch/BatchSession.scala#L61

May 14 '19 13:05 igorcalabria

@igorcalabria Oh, I see, will try that, thanks.

May 14 '19 13:05 jahstreet

Shouldn't the Livy docker/helm charts should also be part of the livy repository since its most likely that users would want to run Livy in a K8s container while launching spark on k8s?. Maybe it can be added as a follow-up task.

Well that a good idea. Since this patch will be accepted and merged I would love to take care of that. Thanks for your feedback.

May 23 '19 13:05 jahstreet

@jahstreet There's a minor issue when a interactive session is recovered from a filesystem. After a restart, livy correctly recovers the session, but it stops displaying the spark master's url on the "Sessions" tab. The config used was pretty standard

    livy.server.recovery.mode = recovery
    livy.server.recovery.state-store = filesystem
    livy.server.recovery.state-store.url = ...

May 23 '19 17:05 igorcalabria

Livy impersonation seems to not be working. I'm trying to use it with Jupyter and sparkmagic with no luck.

%%configure -f
{
    "proxyUser": "customUser"
}

However, I'm not familiar with Livy enough to say how this should work and if it requires kerberized HDFS cluster.

If I set HADOOP_USER_NAME env variable on the driver and the executor, it runs stuff on top of hadoop as that user.

I saw this in the driver logs however:

19/06/06 14:53:11 DEBUG UserGroupInformation: PrivilegedAction as:customUser (auth:PROXY) via root (auth:SIMPLE) from:org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:150)

Jun 06 '19 14:06 ghost

Livy impersonation seems to not be working. I'm trying to use it with Jupyter and sparkmagic with no luck.
%%configure -f
{
    "proxyUser": "customUser"
}
However, I'm not familiar with Livy enough to say how this should work and if it requires kerberized HDFS cluster.

If I set HADOOP_USER_NAME env variable on the driver and the executor, it runs stuff on top of hadoop as that user.

I saw this in the driver logs however:
19/06/06 14:53:11 DEBUG UserGroupInformation: PrivilegedAction as:customUser (auth:PROXY) via root (auth:SIMPLE) from:org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:150)

Actually I'm not familiar with Livy impersonation and do not know how it should behave. Maybe someone can clarify that?

Jun 19 '19 12:06 jahstreet

@jahstreet thanks a lot for your contribution, I'm wondering do you a design doc about k8s support on Livy?

Jul 12 '19 02:07 jerryshao

@jerryshao thx. I'm finalizing it. Will add to the PR next week.

Jul 12 '19 07:07 jahstreet

@jahstreet ping

Jul 24 '19 18:07 brunowego

@jahstreet thanks a lot for your contribution, I'm wondering do you a design doc about k8s support on Livy?

Here is my view of design concept I planned to implement: https://github.com/jahstreet/spark-on-kubernetes-helm/blob/develop/README.md

Jul 25 '19 14:07 jahstreet

Thanks @jahstreet , I will take a look at it. BTW, it would be better to attach the design doc to JIRA.

Jul 26 '19 01:07 jerryshao

@jahstreet , looks like if the livy pod restarts, the application link (that points to driverUI) for the interactive session does not appear anymore. The log link also doesn't seem to work. Wont Livy poll k8s to update the driver UI url or is it something broken in the Livy session recovery? This works for the batch session though.

Aug 02 '19 01:08 arunmahadevan

@jahstreet , looks like if the livy pod restarts, the application link (that points to driverUI) for the interactive session does not appear anymore. The log link also doesn't seem to work. Wont Livy poll k8s to update the driver UI url or is it something broken in the Livy session recovery? This works for the batch session though.

Added the fix: https://github.com/apache/incubator-livy/pull/167/files#diff-7649a51ad4bddc91b6f1038e06479d41R404-R409

Aug 02 '19 10:08 jahstreet

I'm now trying to get this PR to work and I'm facing an issue where the started driver pod fails to connect back to the livy server to port 10000.

There's a relevant log line from the driver: RSCDriver:160 - Connecting to: livy.spark.svc:10000

The livy.spark.svc is a valid DNS hostname pointing to the service which points to the livy, but it maps only port 80. If the driver would instead have the livy server pod ip it would work. I'm not sure how this is supposed to work.

Aug 21 '19 10:08 garo

incubator-livy
incubator-livy copied to clipboard

[LIVY-588]: Full support for Spark on Kubernetes

What changes were proposed in this pull request?

How was this patch tested?

Codecov Report

incubator-livy incubator-livy copied to clipboard

[LIVY-588]: Full support for Spark on Kubernetes

What changes were proposed in this pull request?

How was this patch tested?

Codecov Report

incubator-livy
incubator-livy copied to clipboard