community-ml-runtimes icon indicating copy to clipboard operation
community-ml-runtimes copied to clipboard

Issue: CML Spark configuration overwritten by defaults from Zeppelin image

Open maxhardt opened this issue 1 year ago • 0 comments

✅ CML Spark configuration from the Workbench runtime (with Spark 3.2 enabled):

spark._sc.getConf().getAll()

[('spark.eventLog.enabled', 'true'),
 ('spark.network.crypto.enabled', 'true'),
 ('spark.sql.hive.hwc.execution.mode', 'spark'),
 ('spark.kubernetes.driver.pod.name', 'ikgvegy1ovk9i3l7'),
 ('spark.kubernetes.namespace', 'mlx-user-9'),
 ('spark.yarn.access.hadoopFileSystems',
  's3a://goes-se-sandbox01/warehouse/tablespace/external/hive'),
 ('spark.kerberos.renewal.credentials', 'ccache'),
 ('spark.sql.catalog.spark_catalog',
  'org.apache.iceberg.spark.SparkSessionCatalog'),
 ('spark.dynamicAllocation.maxExecutors', '49'),
 ('spark.eventLog.dir', 'file:///sparkeventlogs'),
 ('spark.driver.bindAddress', '100.100.196.146'),
 ('spark.kubernetes.driver.annotation.cluster-autoscaler.kubernetes.io/safe-to-evict',
  'false'),
 ('spark.ui.port', '20049'),
 ('spark.kubernetes.executor.podNamePrefix', 'cdsw-ikgvegy1ovk9i3l7'),
 ('spark.sql.extensions',
  'com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions'),
 ('spark.kubernetes.executor.annotation.cluster-autoscaler.kubernetes.io/safe-to-evict',
  'false'),
 ('spark.executor.memory', '1g'),
 ('spark.kubernetes.container.image',
  'docker.repository.cloudera.com/cloudera/cdsw/ml-runtime-workbench-python3.7-standard:2022.11.2-b2'),
 ('spark.app.id', 'spark-application-1679933817531'),
 ('spark.app.startTime', '1679933815644'),
 ('spark.io.encryption.enabled', 'true'),
 ('spark.serializer.objectStreamReset', '100'),
 ('spark.submit.deployMode', 'client'),
 ('spark.master', 'k8s://https://172.20.0.1:443'),
 ('spark.kubernetes.executor.podTemplateFile', '/tmp/spark-executor.json'),
 ('spark.jars', '/opt/spark/optional-lib/iceberg-spark-runtime.jar'),
 ('spark.sql.warehouse.dir',
  's3a://goes-se-sandbox01/warehouse/tablespace/external/hive'),
 ('spark.dynamicAllocation.shuffleTracking.enabled', 'true'),
 ('spark.driver.memory', '6605m'),
 ('spark.kubernetes.executor.config.dir', '/var/spark/conf'),
 ('spark.repl.local.jars',
  'file:///runtime-addons/spark320-17-hf1-f6inq5/opt/spark/optional-lib/iceberg-spark-runtime-3.2_2.12-0.14.1.1.17.7215.0-31.jar'),
 ('spark.executor.id', 'driver'),
 ('spark.ui.proxyRedirectUri',
  'https://spark-ikgvegy1ovk9i3l7.ml-a8a67fa2-82d.se-sandb.a465-9q4k.cloudera.site'),
 ('spark.app.initial.jar.urls',
  'spark://100.100.196.146:39291/jars/iceberg-spark-runtime-3.2_2.12-0.14.1.1.17.7215.0-31.jar'),
 ('spark.hadoop.yarn.resourcemanager.principal', 'mengelhardt'),
 ('spark.yarn.rmProxy.enabled', 'false'),
 ('spark.driver.host', '100.100.196.146'),
 ('spark.sql.catalogImplementation', 'hive'),
 ('spark.rdd.compress', 'True'),
 ('spark.kryo.registrator',
  'com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator'),
 ('spark.submit.pyFiles', ''),
 ('spark.dynamicAllocation.enabled', 'true'),
 ('spark.deploy.mode', 'client'),
 ('spark.executor.cores', '1'),
 ('spark.driver.port', '39291'),
 ('spark.ui.allowFramingFrom',
  'https://ml-a8a67fa2-82d.se-sandb.a465-9q4k.cloudera.site'),
 ('spark.app.name', 'None'),
 ('spark.ui.showConsoleProgress', 'true'),
 ('spark.sql.catalog.spark_catalog.type', 'hive'),
 ('spark.authenticate', 'true')]

🙅 CML Spark configuration from the Zeppelin runtime (with Spark 3.2 enabled):

%pyspark
for c in  sc.getConf().getAll():
    print(c)

('spark.eventLog.enabled', 'true')
('spark.jars', 'file:/opt/zeppelin/interpreter/spark/spark-interpreter-0.10.1.jar')
('spark.network.crypto.enabled', 'true')
('zeppelin.pyspark.useIPython', 'true')
('zeppelin.spark.concurrentSQL', 'true')
('spark.dynamicAllocation.maxExecutors', '49')
('spark.eventLog.dir', 'file:///sparkeventlogs')
('spark.kubernetes.driver.pod.name', '75rf1cajaf041wgb')
('spark.driver.memory', '1g')
('zeppelin.spark.scala.version', '2.12')
('zeppelin.spark.run.asLoginUser', 'true')
('zeppelin.interpreter.connection.poolsize', '100')
('spark.kubernetes.executor.podNamePrefix', 'cdsw-75rf1cajaf041wgb')
('spark.webui.yarn.useProxy', 'false')
('spark.kubernetes.executor.annotation.cluster-autoscaler.kubernetes.io/safe-to-evict', 'false')
('spark.executor.memory', '1g')
('spark.useHiveContext', 'true')
('spark.master', 'local[*]')
('spark.app.startTime', '1679932785590')
('SPARK_HOME', '/opt/spark')
('zeppelin.kotlin.shortenTypes', 'true')
('spark.driver.extraJavaOptions', ' -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///opt/zeppelin/conf/log4j.properties -Dlog4j.configurationFile=file:///opt/zeppelin/conf/log4j2.properties -Dzeppelin.log.file=/opt/zeppelin/logs/zeppelin-interpreter-spark-shared_process-cdsw-75rf1cajaf041wgb.log')
('spark.driver.port', '44411')
('zeppelin.R.shiny.portRange', ':')
('spark.app.initial.jar.urls', 'spark://100.100.196.144:44411/jars/spark-interpreter-0.10.1.jar')
('spark.kubernetes.executor.config.dir', '/var/spark/conf')
('zeppelin.spark.printREPLOutput', 'true')
('zeppelin.spark.enableSupportedVersionCheck', 'true')
('zeppelin.spark.maxResult', '1000')
('spark.executor.id', 'driver')
('zeppelin.R.image.width', '100%')
('zeppelin.spark.ui.hidden', 'false')
('zeppelin.spark.deprecatedMsg.show', 'true')
('spark.sql.catalogImplementation', 'hive')
('spark.ui.allowFramingFrom', 'https://ml-a8a67fa2-82d.se-sandb.a465-9q4k.cloudera.site')
('zeppelin.spark.sql.interpolation', 'false')
('spark.authenticate', 'true')
('spark.ui.proxyRedirectUri', 'https://spark-75rf1cajaf041wgb.ml-a8a67fa2-82d.se-sandb.a465-9q4k.cloudera.site')
('spark.kubernetes.namespace', 'mlx-user-9')
('PYSPARK_DRIVER_PYTHON', '/usr/local/bin/python3')
('spark.driver.extraClassPath', ':/opt/zeppelin/local-repo/spark/*:/opt/zeppelin/interpreter/spark/*:::/opt/zeppelin/interpreter/zeppelin-interpreter-shaded-0.10.1.jar:/opt/zeppelin/interpreter/spark/spark-interpreter-0.10.1.jar:/etc/hadoop/conf')
('spark.kerberos.renewal.credentials', 'ccache')
('zeppelin.R.cmd', 'R')
('spark.kubernetes.container.image', 'maxhardt90/cml-runtime-zeppelin:0.1')
('spark.kubernetes.driver.annotation.cluster-autoscaler.kubernetes.io/safe-to-evict', 'false')
('spark.ui.port', '20049')
('zeppelin.interpreter.output.limit', '102400')
('spark.executor.instances', '2')
('spark.io.encryption.enabled', 'true')
('PYSPARK_PYTHON', '/usr/local/bin/python3')
('spark.repl.class.outputDir', '/tmp/spark1760787753920486345')
('spark.submit.deployMode', 'client')
('zeppelin.R.knitr', 'true')
('zeppelin.interpreter.localRepo', '/opt/zeppelin/local-repo/spark')
('spark.kubernetes.executor.podTemplateFile', '/tmp/spark-executor.json')
('spark.driver.cores', '1')
('zeppelin.spark.useHiveContext', 'true')
('spark.sql.warehouse.dir', 's3a://goes-se-sandbox01/warehouse/tablespace/external/hive')
('spark.dynamicAllocation.shuffleTracking.enabled', 'true')
('spark.driver.bindAddress', '100.100.196.144')
('spark.driver.host', '100.100.196.144')
('spark.scheduler.mode', 'FAIR')
('zeppelin.R.render.options', "out.format = 'html', comment = NA, echo = FALSE, results = 'asis', message = F, warning = F, fig.retina = 2")
('spark.app.name', 'spark-shared_process')
('spark.yarn.rmProxy.enabled', 'false')
('spark.app.id', 'local-1679932786669')
('spark.submit.pyFiles', '')
('zeppelin.spark.concurrentSQL.max', '10')
('spark.repl.class.uri', 'spark://100.100.196.144:44411/classes')
('spark.dynamicAllocation.enabled', 'true')
('spark.deploy.mode', 'client')
('zeppelin.spark.sql.stacktrace', 'true')
('zeppelin.spark.scala.color', 'true')
('spark.executor.cores', '1')

▶️ Ask

CML Zeppelin Docker image may be modified so that Zeppelin does overwrite the CML Spark configuration with the Zeppelin default Spark configuration.

maxhardt avatar Mar 28 '23 13:03 maxhardt