SynapseML
SynapseML copied to clipboard
[BUG]NoSuchMethodError breeze.linalg.SliceVector
SynapseML version
0.10.1-22-95f451ab-SNAPSHOT
System information
- Language version: python 3.8, scala 2.12
- Spark Version : 3.2
- Spark Platform : Synapse
Describe the problem
Getting the following error when using TabularSHAP in synapse ml. This was working few weeks ago suddenly stopped working. No changes were done in synapse spark pool.
Code to reproduce issue
from pyspark.ml.regression import LinearRegression
from pyspark.ml import Pipeline
from pyspark.ml.feature import VectorAssembler
from synapse.ml.explainers import TabularSHAP
df1 = spark.createDataFrame([
{'a':1.0, 'b':2, 'c': 1.0},
{'a':2.7, 'b':3, 'c': 2.0},
{'a':3.8, 'b':4, 'c': 3.0},
{'a':5.7, 'b':9, 'c': 4.0},
{'a':4.2, 'b':10, 'c': 1.0},
{'a':6.1, 'b':15, 'c': 10.0},
])
model = LinearRegression(
featuresCol="features",
labelCol='b'
)
assembler = VectorAssembler(inputCols=['a','b'], outputCol="features")
pipeline = Pipeline(stages=[assembler, model])
lrModel = pipeline.fit(df1)
shap = TabularSHAP(
inputCols=['a', 'b'],
outputCol='outCol',
model=lrModel,
backgroundData=df1,
targetCol='prediction',
)
df_shap = shap.transform(df1)
display(df_shap)
Other info / logs
Py4JJavaError Traceback (most recent call last)
/tmp/ipykernel_6705/3699412776.py in <module>
26 df_shap = shap.transform(df1)
27
---> 28 display(df_shap)
~/cluster-env/clonedenv/lib/python3.8/site-packages/notebookutils/visualization/display.py in display(data, summary)
238 log4jLogger \
239 .error(f"display failed with error, language: python, error: {err}, correlationId={correlation_id}")
--> 240 raise err
241 finally:
242 duration_ms = ceil((time.time() - start_time) * 1000)
~/cluster-env/clonedenv/lib/python3.8/site-packages/notebookutils/visualization/display.py in display(data, summary)
216 from IPython.display import publish_display_data
217 publish_display_data({
--> 218 "application/vnd.synapse.display-widget+json": sc._jvm.display.getDisplayResultForIPython(
219 df._jdf, summary, correlation_id)
220 })
~/cluster-env/clonedenv/lib/python3.8/site-packages/py4j/java_gateway.py in __call__(self, *args)
1319
1320 answer = self.gateway_client.send_command(command)
-> 1321 return_value = get_return_value(
1322 answer, self.gateway_client, self.target_id, self.name)
1323
/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py in deco(*a, **kw)
109 def deco(*a, **kw):
110 try:
--> 111 return f(*a, **kw)
112 except py4j.protocol.Py4JJavaError as e:
113 converted = convert_exception(e.java_exception)
~/cluster-env/clonedenv/lib/python3.8/site-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling z:com.microsoft.spark.notebook.visualization.display.getDisplayResultForIPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 45 in stage 510.0 failed 4 times, most recent failure: Lost task 45.3 in stage 510.0 (TID 12952) (vm-68767911 executor 1): org.apache.spark.SparkException: Failed to execute user defined function (functions$$$Lambda$8434/530344905: (struct<a:bigint>, struct<a:bigint>) => array<struct<sample:struct<a:bigint>,coalition:struct<type:tinyint,size:int,indices:array<int>,values:array<double>>,weight:double>>)
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:136)
at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.generate_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:763)
at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:300)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.$anonfun$prepareShuffleDependency$10(ShuffleExchangeExec.scala:375)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:905)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:905)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:57)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:374)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:338)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:57)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:374)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:338)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:208)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPSamplerSupport.generateSampleSizes(KernelSHAPSampler.scala:47)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPSamplerSupport.generateSampleSizes$(KernelSHAPSampler.scala:44)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPTabularSampler.generateSampleSizes(Sampler.scala:202)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPSamplerSupport.generateCoalitions(KernelSHAPSampler.scala:137)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPSamplerSupport.generateCoalitions$(KernelSHAPSampler.scala:129)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPTabularSampler.generateCoalitions(Sampler.scala:202)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPSamplerSupport.com$microsoft$azure$synapse$ml$explainers$KernelSHAPSamplerSupport$$randomStateGenerator(KernelSHAPSampler.scala:34)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPSamplerSupport.com$microsoft$azure$synapse$ml$explainers$KernelSHAPSamplerSupport$$randomStateGenerator$(KernelSHAPSampler.scala:33)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPTabularSampler.com$microsoft$azure$synapse$ml$explainers$KernelSHAPSamplerSupport$$randomStateGenerator$lzycompute(Sampler.scala:202)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPTabularSampler.com$microsoft$azure$synapse$ml$explainers$KernelSHAPSamplerSupport$$randomStateGenerator(Sampler.scala:202)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPSamplerSupport.nextState(KernelSHAPSampler.scala:31)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPSamplerSupport.nextState$(KernelSHAPSampler.scala:31)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPTabularSampler.nextState(Sampler.scala:202)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPSampler.sample(KernelSHAPSampler.scala:16)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPSampler.sample$(KernelSHAPSampler.scala:15)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPTabularSampler.sample(Sampler.scala:202)
at com.microsoft.azure.synapse.ml.explainers.TabularSHAP.$anonfun$createSamples$7(TabularSHAP.scala:62)
at com.microsoft.azure.synapse.ml.explainers.TabularSHAP.$anonfun$createSamples$7$adapted(TabularSHAP.scala:60)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.immutable.Range.foreach(Range.scala:158)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at com.microsoft.azure.synapse.ml.explainers.TabularSHAP.$anonfun$createSamples$6(TabularSHAP.scala:59)
at org.apache.spark.injections.UDFUtils$$anon$2.call(UDFUtils.scala:29)
at org.apache.spark.sql.functions$.$anonfun$udf$93(functions.scala:5188)
... 24 more
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2464)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2413)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2412)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2412)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1168)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1168)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1168)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2652)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2594)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2583)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
Caused by: org.apache.spark.SparkException: Failed to execute user defined function (functions$$$Lambda$8434/530344905: (struct<a:bigint>, struct<a:bigint>) => array<struct<sample:struct<a:bigint>,coalition:struct<type:tinyint,size:int,indices:array<int>,values:array<double>>,weight:double>>)
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:136)
at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.generate_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:763)
at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:300)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.$anonfun$prepareShuffleDependency$10(ShuffleExchangeExec.scala:375)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:905)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:905)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:57)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:374)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:338)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:57)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:374)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:338)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:208)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPSamplerSupport.generateSampleSizes(KernelSHAPSampler.scala:47)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPSamplerSupport.generateSampleSizes$(KernelSHAPSampler.scala:44)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPTabularSampler.generateSampleSizes(Sampler.scala:202)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPSamplerSupport.generateCoalitions(KernelSHAPSampler.scala:137)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPSamplerSupport.generateCoalitions$(KernelSHAPSampler.scala:129)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPTabularSampler.generateCoalitions(Sampler.scala:202)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPSamplerSupport.com$microsoft$azure$synapse$ml$explainers$KernelSHAPSamplerSupport$$randomStateGenerator(KernelSHAPSampler.scala:34)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPSamplerSupport.com$microsoft$azure$synapse$ml$explainers$KernelSHAPSamplerSupport$$randomStateGenerator$(KernelSHAPSampler.scala:33)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPTabularSampler.com$microsoft$azure$synapse$ml$explainers$KernelSHAPSamplerSupport$$randomStateGenerator$lzycompute(Sampler.scala:202)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPTabularSampler.com$microsoft$azure$synapse$ml$explainers$KernelSHAPSamplerSupport$$randomStateGenerator(Sampler.scala:202)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPSamplerSupport.nextState(KernelSHAPSampler.scala:31)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPSamplerSupport.nextState$(KernelSHAPSampler.scala:31)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPTabularSampler.nextState(Sampler.scala:202)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPSampler.sample(KernelSHAPSampler.scala:16)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPSampler.sample$(KernelSHAPSampler.scala:15)
at com.microsoft.azure.synapse.ml.explainers.KernelSHAPTabularSampler.sample(Sampler.scala:202)
at com.microsoft.azure.synapse.ml.explainers.TabularSHAP.$anonfun$createSamples$7(TabularSHAP.scala:62)
at com.microsoft.azure.synapse.ml.explainers.TabularSHAP.$anonfun$createSamples$7$adapted(TabularSHAP.scala:60)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.immutable.Range.foreach(Range.scala:158)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at com.microsoft.azure.synapse.ml.explainers.TabularSHAP.$anonfun$createSamples$6(TabularSHAP.scala:59)
at org.apache.spark.injections.UDFUtils$$anon$2.call(UDFUtils.scala:29)
at org.apache.spark.sql.functions$.$anonfun$udf$93(functions.scala:5188)
... 24 more
What component(s) does this bug affect?
- [ ]
area/cognitive: Cognitive project - [X]
area/core: Core project - [ ]
area/deep-learning: DeepLearning project - [ ]
area/lightgbm: Lightgbm project - [ ]
area/opencv: Opencv project - [ ]
area/vw: VW project - [ ]
area/website: Website - [ ]
area/build: Project build system - [ ]
area/notebooks: Samples under notebooks folder - [ ]
area/docker: Docker usage - [ ]
area/models: models related issue
What language(s) does this bug affect?
- [ ]
language/scala: Scala source code - [X]
language/python: Pyspark APIs - [ ]
language/r: R APIs - [ ]
language/csharp: .NET APIs - [ ]
language/new: Proposals for new client languages
What integration(s) does this bug affect?
- [X]
integrations/synapse: Azure Synapse integrations - [ ]
integrations/azureml: Azure ML integrations - [ ]
integrations/databricks: Databricks integrations
Hello, I have the same issue as you do. Have you found any solution for this ? Thank you!
@MalekBhz Unfortunately I didn't find any solution for spark version 3.2, i had to upgrade to spark version to 3.3.
Hi, Spark 3.2 is no longer supported by Synapse, and I can no longer create a 3.2 pool to reproduce the issue. However, I did test your code in Spark 3.3 and 3.4 pools with the following configurations, and the code works in both cases.
Spark 3.3 pool:
%%configure -f
{
"name": "synapseml",
"conf": {
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.11.4-spark3.3",
"spark.jars.repositories": "https://mmlspark.azureedge.net/maven",
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind",
"spark.yarn.user.classpath.first": "true",
"spark.sql.parquet.enableVectorizedReader": "false"
}
}
Spark 3.4 pool:
%%configure -f
{
"name": "synapseml",
"conf": {
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.5",
"spark.jars.repositories": "https://mmlspark.azureedge.net/maven",
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind",
"spark.yarn.user.classpath.first": "true",
"spark.sql.parquet.enableVectorizedReader": "false"
}
}