oshinko-webui icon indicating copy to clipboard operation
oshinko-webui copied to clipboard

RFE: Specify worker/executor memory under 'advanced cluster configuration'

Open erikerlandson opened this issue 8 years ago • 33 comments

Executor memory defaults to 1 GB, which is a good default but it would be nice to have a convenient way to increase that when creating a cluster on the webui.

erikerlandson avatar Jun 04 '17 16:06 erikerlandson

Is this something that is typically done via a configmap on the worker nodes? If so, you can specify a worker configmap on the advanced form already. Is this RFE just to make that process easier (ie: specify the executor memory on the form, which in turn will create the configmap under the covers and use that)?

crobby avatar Jun 05 '17 14:06 crobby

It should be configurable via config-maps as spark.executor.memory. It may be that we want to adhere to that convention. I should try it and see how easy it is.

erikerlandson avatar Jun 05 '17 14:06 erikerlandson

Just thinking out loud here... on one hand it feels a bit heavy-weight to generate a config-map and apply it, for the purpose of setting a single property. On the other hand, we don't really want to get sucked into the business of adding dozens of fields in the UI, corresponding to a large and evolving list of spark properties.

I wonder if it is possible to just add a text input where a person could add spark property settings:

spark.executor.memory = 4096
spark.executor.cores = 4
spark.ui.enabled = false
... etc

Then have the logic of config-file creation run automatically. Easy button style, I guess.

erikerlandson avatar Jun 05 '17 14:06 erikerlandson

Maybe we can bake in some extra param on the oshinko-cli (which is what the webui uses under the covers) to specify configs like that. In turn, they can be used to either create a new configmap, or maybe even append to another configmap that is in use.

crobby avatar Jun 05 '17 17:06 crobby

Is there a possibility to define these criteria on the cli? If not, where sould I look in the code in order to integrate such capabilities?

NecromuncherDev avatar Apr 16 '18 16:04 NecromuncherDev

I don't believe there is currently a specific flag in the cli to make this happen. I think that executor memory can be set in the spark configuration though and that functionality (overriding spark config defaults) is supported by both the cli and webui (the configmap would need to be created outside of the webui, either via the oc cli or the openshift console).

crobby avatar Apr 16 '18 16:04 crobby

Ok, here's a partial "how to" for at least part of this. (We'll be creating another issue to make this answer complete and more obvious in the near future). https://github.com/radanalyticsio/oshinko-cli/blob/master/rest/docs/ClusterConfigs.md documents how a configMap can be created to override some or all of the spark configuration. https://spark.apache.org/docs/latest/configuration.html has a listing of all the possible configs. You should be able to include any of those in the configmap that you create and then ask the cli or webui to use that configmap to spin-up your cluster.

crobby avatar Apr 16 '18 19:04 crobby

I'm sorry in advance if this sounds stupid but after generating a new configmap and using it with the cli like this:

oc create configmap m1-c4  --from-literal=spark.executor.memory=1g --from-literal=spark.executor.cores=4

oshinko create my-m1-c4 --workers=2 --workerconfig=m1-c4

After seeing that the pods span-up alright I peaked at the spark master webui and it said there that a total of only 4 cores are available instead of the 8 configured (4 cores * 2 workers).

Am I doing something wrong?

P.S. - if in your oppinion this should be under a seperate issue just say the word

NecromuncherDev avatar Apr 17 '18 11:04 NecromuncherDev

i think this has to do with how oshinko is provisioning the workers and the way that openshift assigns the cores to each container.

from the spark docs:

The number of cores assigned to each executor is configurable. When spark.executor.cores is explicitly set, multiple executors from the same application may be launched on the same worker if the worker has enough cores and memory. Otherwise, each executor grabs all the cores available on the worker by default, in which case only one executor per application may be launched on each worker during one single schedule iteration.

i have a feeling that since we are not using kubernetes as the cluster scheduler that the workers in your case are doing what is described here. since we do not change the underlying container spawn resources, those containers will still have the default number of cores (and memory) defined by openshift policy.

i'm not sure what the proper solution here is, but this sounds like more about the way spark interacts with openshift for creation of resources. this is a good gap to highlight though, thanks!

elmiko avatar Apr 17 '18 12:04 elmiko

@ThatBeardedDude

Really happy to have you using oshinko, it's especially great to get feedback from someone who is looking with fresh eyes.

Our cluster configmaps for the master and worker are organized around spark config files, so you can't actually drop config values directly in a configmap, they have to be contained in a file. The easiest way to do this in this case is to put your configs in a file and use --from-file

  1. create spark-defaults.conf file that contains what you want
  2. oc create configmap workerconfig --from-file=spark-defaults.conf

This will cause OpenShift to create a configmap where the filename is the key and the file contents are the value, and when it's mounted as a configmap volume the files will be written to /opt/spark/conf.

Likewise, you can put multiple config files in a directory and do

oc create configmap workerconfig --from-file=myconfigdir

and each file will be added to the configmap

Let us know if this works for you.

tmckayus avatar Apr 17 '18 13:04 tmckayus

I will! Thank you @tmckayus very much for this clarification! I can't imagine how much wasted time is saved with this simple change in configmap creation method.

Maybe this should be added to the propper README as well :)

NecromuncherDev avatar Apr 17 '18 15:04 NecromuncherDev

@ThatBeardedDude

noted :) We have a README about this but it needs to move, probably to the rad.io landing page in a HowTo. I'll add one

tmckayus avatar Apr 17 '18 15:04 tmckayus

@ThatBeardedDude as promised, this will show up on the landing page. Good?

https://github.com/radanalyticsio/radanalyticsio.github.io/pull/187

tmckayus avatar Apr 20 '18 17:04 tmckayus

Amazing!

NecromuncherDev avatar Apr 22 '18 07:04 NecromuncherDev

This post is a problem I am facing while using Oshinko v0.4.6 - if this is solved at a current unreleased state, I'll be glad to see the solution

Regarding the gap @elmiko and I discussed earlier, I set up a cluster (following step by step after the README @tmckayus put up) with this configmap:

spark.executor.memory 4g
spark.executor.cores   1

Sadly, I was disappointed to see this in the openshift pod logs:

18/04/23 13:52:30 [INFO] Utils: Successfully started serveice 'sparkWorker' on port 35716.
18/04/23 13:52:30 [INFO] Worker: Starting Spark worker x.x.x.x:35716 with 2 cores, 1024.0 MB RAM.

Meaning the oshinko create simply ignored my configmap. Any ideas?

NecromuncherDev avatar Apr 23 '18 08:04 NecromuncherDev

@ThatBeardedDude What do you see when you launch a job against your cluster? I just ran a cluster with 2g for executor memory and 2 cores for executors and when I launch a job against the cluster, I see the following (requesting start with both 2g memory and 2 cores): 18/04/23 13:25:19 INFO ExecutorRunner: Launch command: "/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64/jre/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx2048M" "-Dspark.driver.port=43525" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://[email protected]:43525" "--executor-id" "0" "--hostname" "172.17.0.7" "--cores" "2" "--app-id" "app-20180423132519-0000" "--worker-url" "spark://[email protected]:42071"

crobby avatar Apr 23 '18 13:04 crobby

Can you share the job you ran so I could try and replicate your results? (This far I only ran a spark-shell comand on a pod terminal and the logs reported starting the worker with the default 2 cores 1G memory)

NecromuncherDev avatar Apr 24 '18 16:04 NecromuncherDev

@ThatBeardedDude Here's what I did to get a quick test.... Get a terminal in one of your worker pods, then run the following

/opt/spark/bin/spark-submit --master $SPARK_MASTER_ADDRESS --class org.apache.spark.examples.SparkPi /opt/spark/examples/jars/spark-examples_2.11-2.2.1.jar 500

It's just using the already included spark-examples SparkPi job.

crobby avatar Apr 24 '18 18:04 crobby

@crobby I executed this command on one of my worker pods and the log output was this:

18/04/25 06:02:16 INFO ExecutorRunner: Launch command:
"/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64/jre/bin/java" "-cp" 
"/opt/spark/conf/:/opt/spark/jars/*" "-Xmx1024M" 
"-Dspark.driver.port=41402" 
"org.apache.spark.executor.CoarseGrainedExecutorBackend" 
"--driver-url" "spark://[email protected]:41402" 
"--executor-id" "1" 
"--hostname" "172.17.0.3" 
"--cores" "2" 
"--app-id" "app-20180425060215-0000" 
"--worker-url" "spark://[email protected]:35431"

In short, for some reason my cluster did not accept the confmap I created and is still using the default. I have no idea what's causing this.

NecromuncherDev avatar Apr 25 '18 06:04 NecromuncherDev

@ThatBeardedDude Can you share the contents of your configmap?

oc get configmap -o yaml

crobby avatar Apr 25 '18 12:04 crobby

My output for oc get configmap -o yaml is:

apiVersion: v1
items:
- apiVersion: v1
  data:
    w-c2m2: |
      spark.executor.cores 2
      spark.executor.memory 2g
  kind: ConfigMap
  metadata:
    creationTimestamp: 2018-04-25T06:20:33Z
    name: c2m2
    namespace: flare-cli
    resourceVersion: "83199"
    selfLink: /api/v1/namespaces/flare-cli/configmaps/c2m2
    uid: c2cb52a8-4850-11e8-88d8-72a9dfa938ad
- apiVersion: v1
  data:
    c4m1: |
      spark.executor.cores    4
      spark.executor.memory   1g
  kind: ConfigMap
  metadata:
    creationTimestamp: 2018-04-24T14:29:23Z
    name: c4m1
    namespace: flare-cli
    resourceVersion: "79270"
    selfLink: /api/v1/namespaces/flare-cli/configmaps/c4m1
    uid: e27b7720-47cb-11e8-ba2c-d2a92405b963
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

I created the the confmap ( file at ~/oshinko-spark.conf.d/c2m2/w-c2m2 ) with this content :

spark.executor.cores 2
spark.executor.memory 2g

And this command : oc create configmap c2m2 --from-file=~/oshinko-spark.conf.d/c2m2

I set up the cluster with this command: oshinko create c2m2 --workerconfig='c2m2' --masterconfig='c2m2' --workers=2

I do have other confmap directories under ~/oshinko-spark.conf.d/

NecromuncherDev avatar Apr 25 '18 13:04 NecromuncherDev

@ThatBeardedDude Ok, I think there is an issue with the configmap itself. When I view the contents of my configmap, here is what I get...

oc get configmap sparkconfig -o yaml

apiVersion: v1 data: spark-defaults.conf: | # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. #

# Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.

# Example:
# spark.master                     spark://master:7077
# spark.eventLog.enabled           true
# spark.eventLog.dir               hdfs://namenode:8021/directory
# spark.serializer                 org.apache.spark.serializer.KryoSerializer
# spark.driver.memory              5g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
spark.ui.reverseProxy              true
spark.ui.reverseProxyUrl           /
spark.executor.memory              2g
spark.executor.cores               2

kind: ConfigMap metadata: creationTimestamp: 2018-04-25T14:05:26Z name: sparkconfig namespace: test resourceVersion: "3082668" selfLink: /api/v1/namespaces/test/configmaps/sparkconfig uid: b43ca653-4891-11e8-afeb-c85b764bc0a0

when I issue my create configmap command, my value for --from-file is a directory that contains spark-defaults.conf. I think your value c2m2 is actually a file itself and I suspect that is why you're not seeing the desired result. Can you create a directory with a file named spark-defaults.conf that contains your desired settings, then run the create configmap on that directory (yes, the --from-file naming can be misleading and took me a second to grasp as well--perhaps a little too flexible). I think that will get you where you want to be.

crobby avatar Apr 25 '18 14:04 crobby

My c2m2 is a directory.

$ tree oshinko-spark.conf.d/

oshinko-spark.conf.d/
├── c2m2
│   └── w-c2m2
└── c4m1
    └── w-c4m1

Maybe it can't operate if spark-defaults.conf isn't in the directory?

NecromuncherDev avatar Apr 25 '18 14:04 NecromuncherDev

Yes, I believe we key off of the name "spark-defaults.conf". This is necessary because other things (like logging via log4j.properties) can also be set in the same configmap.

crobby avatar Apr 25 '18 14:04 crobby

It's more than that, it's actually what Spark itself looks for. The entries in the configmap need to be actual spark configuration files, with the "key" as the filename and the value as the contents. Those "files" are written to /opt/spark/conf/. I'll update the how-do-i on rad io to make this more clear.

tmckayus avatar Apr 25 '18 14:04 tmckayus

After changing to the spark-defaults.conf standard, I did some tests and discovered something interesting.

I ran the sparkPi job on the default cluster (2 cores and 1g RAM per executor, 2 workers) and it worked perfectly - so surprise here.

After that I created a c4m1 cluster (4 cores 1g RAM per executor, 1 worker) but it failed with: WARN: TaskScedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered with sufficient resources. This had me thinking - is there a way to define the worker's resources limits and not the executors'? (I'm sure there is and I just hadn't found it yet)

To ensure myself that this is indeed the problem I ran sparkPi again, now on a c1m1 cluster (1 core and 1g RAM per executor, 2 workers) and it worked! Finally a confmap of my making worked (joy). What I saw in the spark UI ensured me that the worker size itself is the problem because there the resources were: Cores in use: 4 Total, 2 Used Memory in use: 2.0 GB Total, 2.0 GB Used

So two questions come to mind:

  1. How to increase (or at all configure) spark worker resources?
  2. Why is this limitation not written anywhere? or alternatively: How did I miss this?

Any help with these questions? (At least the first one)

NecromuncherDev avatar Apr 25 '18 15:04 NecromuncherDev

It looks like there is the potential to set some environment variables that might be of use to us. SPARK_WORKER_CORES and SPARK_WORKER_MEMORY appear to be able to be set from the environment (http://spark.apache.org/docs/latest/spark-standalone.html), but I don't see them as something that can appear as a config. I have not tried such a thing yet, but I will do some experimenting around this. If those are indeed settable and useful, we may want an enhancement to make them easier to set.

crobby avatar Apr 25 '18 16:04 crobby

Ok, a quick tweak and test (For my purposes, I modified our oshinko-webui to set these env vars at cluster-create time) seems to indicate that setting the env vars SPARK_WORKER_CORES and SPARK_WORKER_MEMORY does have an impact. On a cluster with my tweaks to set the env vars (2G for memory, and 1 for cores), in the Spark UI for the worker, I see the following.

ID: worker-20180425163333-172.17.0.10-44127
Master URL: spark://172.17.0.11:7077
Cores: 1 (0 Used)
Memory: 2.0 GB (0.0 B Used)

I have not run many jobs against my updated cluster to see what the impact may be, but my simple sparkPi test did still run.

Is the above result something you'd be interested in working with? If it is, I may be able publish my updated oshinko-webui image for you to use (along with instructions on how to use it). Ideally, you could share your results with us to determine if it's a feature we want to add to the whole oshinko suite (CLI and our our other web UI).

crobby avatar Apr 25 '18 16:04 crobby

This is exactly what I looked for! It would enable createing spark clusters designed for specific jobs. That, in fact, was my original purpose when I changed the spark.executor.memory/cores and if what you say is true I think this would come in very handy to me (and probably other oshinko users)

NecromuncherDev avatar Apr 25 '18 17:04 NecromuncherDev

Note that I believe this can still be done with configmaps, because you can set that stuff in spark-env.sh if I'm not mistaken.

tmckayus avatar Apr 25 '18 17:04 tmckayus