[template] create templates for use in generating actions
This PR should resolve #1276 and is an attempt at better solving the problem space of #1030
I believe that #1259 could be implemented easier using this change, but its dependency on rebooting is antithetical to Dataproc in many ways and has not been included. I will meet with NVIDIA and the Dataproc engineering team to troubleshoot the problem.
This PR includes code refactored out of GPU-acceleration-related and dask-related actions and into files under the templates/ directory of the repository. There are a set of PRs which rebase to this branch:
- #1290 - [gpu] Exercise new template-generated GPU driver installer
- #1288 - [rapids] generated from template
- #1287 - [dask] generating dask/dask.sh from template
- #1284 - [spark-rapids] generate spark-rapids/spark-rapids.sh from template
/gcbrun
/gcbrun
/gcbrun
/gcbrun
/gcbrun
/gcbrun
/gcbrun
using the test suite I just cleaned up for #1275
/gcbrun
2.1-debian11 failure:
2025-01-03T03:18:35.157639402Z AssertionError: 1 != 0 : Failed to execute command:
2025-01-03T03:18:35.157650162Z gcloud dataproc jobs submit spark --cluster=test-gpu-standard-2-1-20250103-030909-kdee --region=us-central1 --jars=file:///usr/lib/spark/examples/jars/spark-examples.jar --class=org.apache.spark.examples.ml.JavaIndexToStringExample --properties=spark.executor.resource.gpu.amount=1,spark.executor.cores=6,spark.executor.memory=4G,spark.task.resource.gpu.amount=0.333,spark.task.cpus=2,spark.yarn.unmanagedAM.enabled=false
2025-01-03T03:18:35.157660172Z STDOUT:
2025-01-03T03:18:35.157694322Z
2025-01-03T03:18:35.157706472Z STDERR:
2025-01-03T03:18:35.157715992Z Job [474683bad64a45e8af6cc00ccc9695ae] submitted.
2025-01-03T03:18:35.157726222Z Waiting for job output...
2025-01-03T03:18:35.157735722Z 25/01/03 03:14:42 INFO SparkEnv: Registering MapOutputTracker
2025-01-03T03:18:35.157768422Z 25/01/03 03:14:42 INFO SparkEnv: Registering BlockManagerMaster
2025-01-03T03:18:35.157778362Z 25/01/03 03:14:42 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
2025-01-03T03:18:35.157787802Z 25/01/03 03:14:42 INFO SparkEnv: Registering OutputCommitCoordinator
2025-01-03T03:18:35.157797152Z 25/01/03 03:14:43 INFO DataprocSparkPlugin: Registered 128 driver metrics
2025-01-03T03:18:35.157805932Z 25/01/03 03:14:43 INFO ShimLoader: Loading shim for Spark version: 3.3.2
2025-01-03T03:18:35.157815022Z 25/01/03 03:14:43 INFO ShimLoader: Complete Spark build info: 3.3.2, https://bigdataoss-internal.googlesource.com/third_party/apache/spark, dataproc-branch-3.3.2, 5672c094ffe3ff9aa967db7b81163e1cc586a093, 2024-10-23T22:06:45Z
2025-01-03T03:18:35.157824862Z 25/01/03 03:14:43 INFO ShimLoader: findURLClassLoader found a URLClassLoader org.apache.spark.util.MutableURLClassLoader@61ab89b0
2025-01-03T03:18:35.157836082Z 25/01/03 03:14:43 INFO ShimLoader: Updating spark classloader org.apache.spark.util.MutableURLClassLoader@61ab89b0 with the URLs: jar:file:/usr/lib/spark/jars/rapids-4-spark_2.12-23.08.2.jar!/spark3xx-common/, jar:file:/usr/lib/spark/jars/rapids-4-spark_2.12-23.08.2.jar!/spark332/
2025-01-03T03:18:35.157845492Z 25/01/03 03:14:43 INFO ShimLoader: Spark classLoader org.apache.spark.util.MutableURLClassLoader@61ab89b0 updated successfully
2025-01-03T03:18:35.157869502Z 25/01/03 03:14:43 INFO ShimLoader: Updating spark classloader org.apache.spark.util.MutableURLClassLoader@61ab89b0 with the URLs: jar:file:/usr/lib/spark/jars/rapids-4-spark_2.12-23.08.2.jar!/spark3xx-common/, jar:file:/usr/lib/spark/jars/rapids-4-spark_2.12-23.08.2.jar!/spark332/
2025-01-03T03:18:35.157880132Z 25/01/03 03:14:43 INFO ShimLoader: Spark classLoader org.apache.spark.util.MutableURLClassLoader@61ab89b0 updated successfully
2025-01-03T03:18:35.157891322Z 25/01/03 03:14:43 INFO RapidsPluginUtils: RAPIDS Accelerator build: {date=2023-10-05T09:57:39Z, cudf_version=23.08.0, version=23.08.2, user=, branch=HEAD, url=https://github.com/NVIDIA/spark-rapids.git, revision=56da18a1be0148025cb00ced2ffe039fbf9c3391}
2025-01-03T03:18:35.157900352Z 25/01/03 03:14:43 INFO RapidsPluginUtils: RAPIDS Accelerator JNI build: {date=2023-08-10T03:31:37Z, version=23.08.0, user=, branch=HEAD, url=https://github.com/NVIDIA/spark-rapids-jni.git, revision=73fcd5ce22a622e5937a613bc5c4a1b32a40aec1}
2025-01-03T03:18:35.157909062Z 25/01/03 03:14:43 INFO RapidsPluginUtils: cudf build: {date=2023-08-10T03:31:37Z, version=23.08.0, user=, branch=HEAD, url=https://github.com/rapidsai/cudf.git, revision=8150d38e080c8fb021921ade83fe3aa3be04b47d}
2025-01-03T03:18:35.157917332Z 25/01/03 03:14:43 WARN RapidsPluginUtils: RAPIDS Accelerator 23.08.2 using cudf 23.08.0.
2025-01-03T03:18:35.157926612Z 25/01/03 03:14:43 WARN RapidsPluginUtils: spark.rapids.sql.multiThreadedRead.numThreads is set to 20.
2025-01-03T03:18:35.157935632Z 25/01/03 03:14:43 WARN RapidsPluginUtils: The current setting of spark.task.resource.gpu.amount (0.333) is not ideal to get the best performance from the RAPIDS Accelerator plugin. It's recommended to be 1/{executor core count} unless you have a special use case.
2025-01-03T03:18:35.157944382Z 25/01/03 03:14:43 WARN RapidsPluginUtils: RAPIDS Accelerator is enabled, to disable GPU support set `spark.rapids.sql.enabled` to false.
2025-01-03T03:18:35.157954512Z 25/01/03 03:14:43 WARN RapidsPluginUtils: spark.rapids.sql.explain is set to `NOT_ON_GPU`. Set it to 'NONE' to suppress the diagnostics logging about the query placement on the GPU.
2025-01-03T03:18:35.157963672Z 25/01/03 03:14:44 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at test-gpu-standard-2-1-20250103-030909-kdee-m.us-central1-f.c.cloud-dataproc-ci.internal./10.128.0.50:8032
2025-01-03T03:18:35.157972112Z 25/01/03 03:14:44 INFO AHSProxy: Connecting to Application History server at test-gpu-standard-2-1-20250103-030909-kdee-m.us-central1-f.c.cloud-dataproc-ci.internal./10.128.0.50:10200
2025-01-03T03:18:35.157991762Z 25/01/03 03:14:44 INFO Configuration: found resource resource-types.xml at file:/etc/hadoop/conf.empty/resource-types.xml
2025-01-03T03:18:35.158000832Z 25/01/03 03:14:44 INFO ResourceUtils: Adding resource type - name = yarn.io/gpu, units = , type = COUNTABLE
2025-01-03T03:18:35.158009842Z 25/01/03 03:14:46 INFO YarnClientImpl: Submitted application application_1735873832026_0001
2025-01-03T03:18:35.158020202Z 25/01/03 03:14:56 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
2025-01-03T03:18:35.158029782Z 25/01/03 03:15:00 WARN GpuOverrides:
2025-01-03T03:18:35.158038982Z !Exec <ObjectHashAggregateExec> cannot run on GPU because not all expressions can be replaced
2025-01-03T03:18:35.158050092Z @Expression <AggregateExpression> stringindexeraggregator(org.apache.spark.ml.feature.StringIndexerAggregator@f243d5c, Some(createexternalrow(category#1.toString, StructField(category,StringType,false))), Some(interface org.apache.spark.sql.Row), Some(StructType(StructField(category,StringType,false))), encodeusingserializer(input[0, java.lang.Object, true], true), decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true), encodeusingserializer(input[0, java.lang.Object, true], true), BinaryType, true, 0, 0) could run on GPU
2025-01-03T03:18:35.158059502Z ! <ComplexTypedAggregateExpression> StringIndexerAggregator(org.apache.spark.sql.Row) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression
2025-01-03T03:18:35.158069402Z ! <CreateExternalRow> createexternalrow(category#1.toString, StructField(category,StringType,false)) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.CreateExternalRow
2025-01-03T03:18:35.158101482Z ! <Invoke> category#1.toString cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.Invoke
2025-01-03T03:18:35.158112172Z @Expression <AttributeReference> category#1 could run on GPU
2025-01-03T03:18:35.158123202Z ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.158131912Z ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158140402Z ! <DecodeUsingSerializer> decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.DecodeUsingSerializer
2025-01-03T03:18:35.158148742Z ! <BoundReference> input[0, binary, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158158062Z ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.158166272Z ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158186702Z @Expression <AttributeReference> StringIndexerAggregator(org.apache.spark.sql.Row)#14 could run on GPU
2025-01-03T03:18:35.158195922Z @Expression <Alias> StringIndexerAggregator(org.apache.spark.sql.Row)#14 AS StringIndexerAggregator(org.apache.spark.sql.Row)#15 could run on GPU
2025-01-03T03:18:35.158204532Z @Expression <AttributeReference> StringIndexerAggregator(org.apache.spark.sql.Row)#14 could run on GPU
2025-01-03T03:18:35.158213622Z !Exec <ShuffleExchangeExec> cannot run on GPU because Columnar exchange without columnar children is inefficient
2025-01-03T03:18:35.158222172Z @Partitioning <SinglePartition$> could run on GPU
2025-01-03T03:18:35.158230522Z !Exec <ObjectHashAggregateExec> cannot run on GPU because not all expressions can be replaced
2025-01-03T03:18:35.158239022Z @Expression <AggregateExpression> partial_stringindexeraggregator(org.apache.spark.ml.feature.StringIndexerAggregator@f243d5c, Some(createexternalrow(category#1.toString, StructField(category,StringType,false))), Some(interface org.apache.spark.sql.Row), Some(StructType(StructField(category,StringType,false))), encodeusingserializer(input[0, java.lang.Object, true], true), decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true), encodeusingserializer(input[0, java.lang.Object, true], true), BinaryType, true, 0, 0) could run on GPU
2025-01-03T03:18:35.158247572Z ! <ComplexTypedAggregateExpression> StringIndexerAggregator(org.apache.spark.sql.Row) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression
2025-01-03T03:18:35.158256122Z ! <CreateExternalRow> createexternalrow(category#1.toString, StructField(category,StringType,false)) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.CreateExternalRow
2025-01-03T03:18:35.158265132Z ! <Invoke> category#1.toString cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.Invoke
2025-01-03T03:18:35.158284682Z @Expression <AttributeReference> category#1 could run on GPU
2025-01-03T03:18:35.158293312Z ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.158302412Z ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158311202Z ! <DecodeUsingSerializer> decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.DecodeUsingSerializer
2025-01-03T03:18:35.158327352Z ! <BoundReference> input[0, binary, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158336902Z ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.158345412Z ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158353802Z @Expression <AttributeReference> buf#19 could run on GPU
2025-01-03T03:18:35.158372552Z @Expression <AttributeReference> buf#20 could run on GPU
2025-01-03T03:18:35.158381252Z ! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec
2025-01-03T03:18:35.158390802Z @Expression <AttributeReference> category#1 could run on GPU
2025-01-03T03:18:35.158399782Z
2025-01-03T03:18:35.158408922Z 25/01/03 03:15:00 INFO GpuOverrides: Plan conversion to the GPU took 82.60 ms
2025-01-03T03:18:35.158418282Z 25/01/03 03:15:00 WARN GpuOverrides:
2025-01-03T03:18:35.158427332Z !Exec <ObjectHashAggregateExec> cannot run on GPU because not all expressions can be replaced
2025-01-03T03:18:35.158436242Z @Expression <AggregateExpression> stringindexeraggregator(org.apache.spark.ml.feature.StringIndexerAggregator@f243d5c, Some(createexternalrow(category#1.toString, StructField(category,StringType,false))), Some(interface org.apache.spark.sql.Row), Some(StructType(StructField(category,StringType,false))), encodeusingserializer(input[0, java.lang.Object, true], true), decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true), encodeusingserializer(input[0, java.lang.Object, true], true), BinaryType, true, 0, 0) could run on GPU
2025-01-03T03:18:35.158444812Z ! <ComplexTypedAggregateExpression> StringIndexerAggregator(org.apache.spark.sql.Row) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression
2025-01-03T03:18:35.158453752Z ! <CreateExternalRow> createexternalrow(category#1.toString, StructField(category,StringType,false)) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.CreateExternalRow
2025-01-03T03:18:35.158462122Z ! <Invoke> category#1.toString cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.Invoke
2025-01-03T03:18:35.158470662Z @Expression <AttributeReference> category#1 could run on GPU
2025-01-03T03:18:35.158478612Z ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.158487022Z ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158495642Z ! <DecodeUsingSerializer> decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.DecodeUsingSerializer
2025-01-03T03:18:35.158504672Z ! <BoundReference> input[0, binary, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158513622Z ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.158522272Z ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158531532Z @Expression <AttributeReference> StringIndexerAggregator(org.apache.spark.sql.Row)#14 could run on GPU
2025-01-03T03:18:35.158540392Z @Expression <Alias> StringIndexerAggregator(org.apache.spark.sql.Row)#14 AS StringIndexerAggregator(org.apache.spark.sql.Row)#15 could run on GPU
2025-01-03T03:18:35.158563802Z @Expression <AttributeReference> StringIndexerAggregator(org.apache.spark.sql.Row)#14 could run on GPU
2025-01-03T03:18:35.158572982Z !Exec <ShuffleExchangeExec> cannot run on GPU because Columnar exchange without columnar children is inefficient
2025-01-03T03:18:35.158581902Z @Partitioning <SinglePartition$> could run on GPU
2025-01-03T03:18:35.158591402Z !Exec <ObjectHashAggregateExec> cannot run on GPU because not all expressions can be replaced
2025-01-03T03:18:35.158600182Z @Expression <AggregateExpression> partial_stringindexeraggregator(org.apache.spark.ml.feature.StringIndexerAggregator@f243d5c, Some(createexternalrow(category#1.toString, StructField(category,StringType,false))), Some(interface org.apache.spark.sql.Row), Some(StructType(StructField(category,StringType,false))), encodeusingserializer(input[0, java.lang.Object, true], true), decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true), encodeusingserializer(input[0, java.lang.Object, true], true), BinaryType, true, 0, 0) could run on GPU
2025-01-03T03:18:35.158609412Z ! <ComplexTypedAggregateExpression> StringIndexerAggregator(org.apache.spark.sql.Row) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression
2025-01-03T03:18:35.158617432Z ! <CreateExternalRow> createexternalrow(category#1.toString, StructField(category,StringType,false)) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.CreateExternalRow
2025-01-03T03:18:35.158626352Z ! <Invoke> category#1.toString cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.Invoke
2025-01-03T03:18:35.158635312Z @Expression <AttributeReference> category#1 could run on GPU
2025-01-03T03:18:35.158644272Z ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.158653102Z ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158661752Z ! <DecodeUsingSerializer> decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.DecodeUsingSerializer
2025-01-03T03:18:35.158690872Z ! <BoundReference> input[0, binary, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158701652Z ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.158710832Z ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158719952Z @Expression <AttributeReference> buf#19 could run on GPU
2025-01-03T03:18:35.158729072Z @Expression <AttributeReference> buf#20 could run on GPU
2025-01-03T03:18:35.158738432Z ! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec
2025-01-03T03:18:35.158759192Z @Expression <AttributeReference> category#1 could run on GPU
2025-01-03T03:18:35.158768182Z
2025-01-03T03:18:35.158777272Z 25/01/03 03:15:00 INFO GpuOverrides: Plan conversion to the GPU took 7.28 ms
2025-01-03T03:18:35.158786472Z 25/01/03 03:15:01 INFO GpuOverrides: Plan conversion to the GPU took 1.26 ms
2025-01-03T03:18:35.158795052Z 25/01/03 03:15:01 WARN GpuOverrides:
2025-01-03T03:18:35.158804352Z !Exec <ObjectHashAggregateExec> cannot run on GPU because not all expressions can be replaced
2025-01-03T03:18:35.158827882Z @Expression <AggregateExpression> stringindexeraggregator(org.apache.spark.ml.feature.StringIndexerAggregator@f243d5c, Some(createexternalrow(category#1.toString, StructField(category,StringType,false))), Some(interface org.apache.spark.sql.Row), Some(StructType(StructField(category,StringType,false))), encodeusingserializer(input[0, java.lang.Object, true], true), decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true), encodeusingserializer(input[0, java.lang.Object, true], true), BinaryType, true, 0, 0) could run on GPU
2025-01-03T03:18:35.158838842Z ! <ComplexTypedAggregateExpression> StringIndexerAggregator(org.apache.spark.sql.Row) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression
2025-01-03T03:18:35.158848372Z ! <CreateExternalRow> createexternalrow(category#1.toString, StructField(category,StringType,false)) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.CreateExternalRow
2025-01-03T03:18:35.158857722Z ! <Invoke> category#1.toString cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.Invoke
2025-01-03T03:18:35.158867352Z @Expression <AttributeReference> category#1 could run on GPU
2025-01-03T03:18:35.158876662Z ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.158888172Z ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158897632Z ! <DecodeUsingSerializer> decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.DecodeUsingSerializer
2025-01-03T03:18:35.158906542Z ! <BoundReference> input[0, binary, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158915172Z ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.158923792Z ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158932872Z @Expression <AttributeReference> StringIndexerAggregator(org.apache.spark.sql.Row)#14 could run on GPU
2025-01-03T03:18:35.158941622Z @Expression <Alias> StringIndexerAggregator(org.apache.spark.sql.Row)#14 AS StringIndexerAggregator(org.apache.spark.sql.Row)#15 could run on GPU
2025-01-03T03:18:35.158962062Z @Expression <AttributeReference> StringIndexerAggregator(org.apache.spark.sql.Row)#14 could run on GPU
2025-01-03T03:18:35.158971042Z !Exec <ShuffleExchangeExec> cannot run on GPU because Columnar exchange without columnar children is inefficient
2025-01-03T03:18:35.158980162Z @Partitioning <SinglePartition$> could run on GPU
2025-01-03T03:18:35.158988692Z !Exec <ObjectHashAggregateExec> cannot run on GPU because not all expressions can be replaced
2025-01-03T03:18:35.158997872Z @Expression <AggregateExpression> partial_stringindexeraggregator(org.apache.spark.ml.feature.StringIndexerAggregator@f243d5c, Some(createexternalrow(category#1.toString, StructField(category,StringType,false))), Some(interface org.apache.spark.sql.Row), Some(StructType(StructField(category,StringType,false))), encodeusingserializer(input[0, java.lang.Object, true], true), decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true), encodeusingserializer(input[0, java.lang.Object, true], true), BinaryType, true, 0, 0) could run on GPU
2025-01-03T03:18:35.159030592Z ! <ComplexTypedAggregateExpression> StringIndexerAggregator(org.apache.spark.sql.Row) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression
2025-01-03T03:18:35.159043982Z ! <CreateExternalRow> createexternalrow(category#1.toString, StructField(category,StringType,false)) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.CreateExternalRow
2025-01-03T03:18:35.159053702Z ! <Invoke> category#1.toString cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.Invoke
2025-01-03T03:18:35.159062732Z @Expression <AttributeReference> category#1 could run on GPU
2025-01-03T03:18:35.159081082Z ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.159091482Z ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.159100782Z ! <DecodeUsingSerializer> decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.DecodeUsingSerializer
2025-01-03T03:18:35.159109962Z ! <BoundReference> input[0, binary, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.159119532Z ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.159128692Z ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.159138382Z @Expression <AttributeReference> buf#19 could run on GPU
2025-01-03T03:18:35.159147352Z @Expression <AttributeReference> buf#20 could run on GPU
2025-01-03T03:18:35.159156532Z ! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec
2025-01-03T03:18:35.159177462Z @Expression <AttributeReference> category#1 could run on GPU
2025-01-03T03:18:35.159186652Z
2025-01-03T03:18:35.159195722Z 25/01/03 03:15:01 INFO GpuOverrides: Plan conversion to the GPU took 4.75 ms
2025-01-03T03:18:35.159204872Z 25/01/03 03:15:01 INFO GpuOverrides: GPU plan transition optimization took 13.66 ms
2025-01-03T03:18:35.159214022Z 25/01/03 03:15:01 WARN GpuOverrides:
2025-01-03T03:18:35.159223582Z !Exec <ShuffleExchangeExec> cannot run on GPU because Columnar exchange without columnar children is inefficient
2025-01-03T03:18:35.159233342Z @Partitioning <SinglePartition$> could run on GPU
2025-01-03T03:18:35.159242602Z !Exec <ObjectHashAggregateExec> cannot run on GPU because not all expressions can be replaced
2025-01-03T03:18:35.159252392Z @Expression <AggregateExpression> partial_stringindexeraggregator(org.apache.spark.ml.feature.StringIndexerAggregator@f243d5c, Some(createexternalrow(category#1.toString, StructField(category,StringType,false))), Some(interface org.apache.spark.sql.Row), Some(StructType(StructField(category,StringType,false))), encodeusingserializer(input[0, java.lang.Object, true], true), decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true), encodeusingserializer(input[0, java.lang.Object, true], true), BinaryType, true, 0, 0) could run on GPU
2025-01-03T03:18:35.159261702Z ! <ComplexTypedAggregateExpression> StringIndexerAggregator(org.apache.spark.sql.Row) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression
2025-01-03T03:18:35.159282182Z ! <CreateExternalRow> createexternalrow(category#1.toString, StructField(category,StringType,false)) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.CreateExternalRow
2025-01-03T03:18:35.159293432Z ! <Invoke> category#1.toString cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.Invoke
2025-01-03T03:18:35.159302532Z @Expression <AttributeReference> category#1 could run on GPU
2025-01-03T03:18:35.159311742Z ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.159321662Z ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.159333852Z ! <DecodeUsingSerializer> decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.DecodeUsingSerializer
2025-01-03T03:18:35.159343432Z ! <BoundReference> input[0, binary, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.159353252Z ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.159362232Z ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.159371352Z @Expression <AttributeReference> buf#19 could run on GPU
2025-01-03T03:18:35.159390812Z @Expression <AttributeReference> buf#20 could run on GPU
2025-01-03T03:18:35.159400042Z ! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec
2025-01-03T03:18:35.159408912Z @Expression <AttributeReference> category#1 could run on GPU
2025-01-03T03:18:35.159417562Z
2025-01-03T03:18:35.159426412Z 25/01/03 03:15:01 INFO GpuOverrides: Plan conversion to the GPU took 4.15 ms
2025-01-03T03:18:35.159435922Z 25/01/03 03:15:01 INFO GpuOverrides: GPU plan transition optimization took 7.25 ms
2025-01-03T03:18:35.159446272Z 25/01/03 03:15:43 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Requesting driver to remove executor 2 for reason Container from a bad node: container_1735873832026_0001_01_000003 on host: test-gpu-standard-2-1-20250103-030909-kdee-w-0.us-central1-f.c.cloud-dataproc-ci.internal. Exit status: 1. Diagnostics: [2025-01-03 03:15:43.085]Exception from container-launch.
2025-01-03T03:18:35.159455422Z Container id: container_1735873832026_0001_01_000003
2025-01-03T03:18:35.159464442Z Exit code: 1
2025-01-03T03:18:35.159472832Z Exception message: Launch container failed
2025-01-03T03:18:35.159481212Z Shell error output: Nonzero exit code=1, error message='Invalid argument number'
/gcbrun
/gcbrun
/gcbrun
/gcbrun
well that's good news, then.
/gcbrun
/gcbrun
/gcbrun
/gcbrun
rebased on master
/gcbrun
What changes will be required in the steps to create a cluster with init scripts using templates after this PR?
Currently, the steps include:
--initialization-actions=gs://goog-dataproc-initialization-actions-${REGION}/spark-rapids/spark-rapids.sh
Referencing the example: Create a Dataproc cluster using T4s.
Update: I just learnt about [spark-rapids] generate spark-rapids/spark-rapids.sh from template. I assume we programmatically regenerate spark-rapids.sh whenever a change is made to the template.
What changes will be required in the steps to create a cluster with init scripts using templates after this PR?
Currently, the steps include:
--initialization-actions=gs://goog-dataproc-initialization-actions-${REGION}/spark-rapids/spark-rapids.shReferencing the example: Create a Dataproc cluster using T4s.
Update: I just learnt about [spark-rapids] generate spark-rapids/spark-rapids.sh from template. I assume we programmatically regenerate
spark-rapids.shwhenever a change is made to the template.
Those instructions seem right, but it may take less time with the new versions, since much of the work is now cached, and when the memory is sufficient, installation utilizes ram disks.
Maybe mention that with new custom images, secure boot can be enabled. The new custom image script requires that the secret manager api service be enabled for the project.
And yes, new actions will be generated from templates on each, now versioned, release.
TODO: apply these changes from gpu driver installer and cloud sql proxy into common templates:
- https://github.com/GoogleCloudDataproc/initialization-actions/pull/1275/commits/e56ddd0fbef897c5fb2ab2d2397e5a4f3a72b330 (from #1275)
- https://github.com/GoogleCloudDataproc/initialization-actions/pull/1298/commits/98573ff5afbb19748795e2d8cdd7cd086c668a78 (#1298)
- https://github.com/GoogleCloudDataproc/initialization-actions/pull/1301/commits/d8f5ab07c034a6ff66a8fbe92f3a33c9ea018124 (#1301)