sagemaker-spark icon indicating copy to clipboard operation
sagemaker-spark copied to clipboard

"unable to evaluate payload provided" with KMeans Clustering Algorithm

Open hdamani09 opened this issue 4 years ago • 1 comments

Please fill out the form below.

System Information

  • Spark:
  • SDK Version:spark_2.2.0-1.2.5
  • Spark Version:2.4.3
  • Algorithm: KMeans:

Describe the problem

Hi, I trained a model with Sagemaker-Spark provided KMeans algorithm for a sample csv dataset and hyperparameters as shown below:

"hyperParameters": {
        "feature_dim": "6",
        "k": "10",
        "mini_batch_size": "15"
}

The dataframe after doing the feature engineering vector assembly is as follows:

+-------------------+---+---+-----+---------------+----+----+----------------------------------+
|Result_of_Treatment|sex|age|Time |Number_of_Warts|Type|Area|features |
+-------------------+---+---+-----+---------------+----+----+----------------------------------+
|0 |1 |17 |9.25 |12 |1 |10 |[0.0,1.0,17.0,9.25,12.0,1.0,10.0] |
|0 |1 |17 |11.5 |2 |1 |10 |[0.0,1.0,17.0,11.5,2.0,1.0,10.0] |
|0 |1 |23 |10.25|7 |3 |72 |[0.0,1.0,23.0,10.25,7.0,3.0,72.0] |
|0 |1 |29 |11.75|5 |1 |96 |[0.0,1.0,29.0,11.75,5.0,1.0,96.0] |
|0
|1 |34 |11.25|1 |3 |150 |[0.0,1.0,34.0,11.25,1.0,3.0,150.0]|
|0 |1 |34 |12.0 |1 |3 |150 |[0.0,1.0,34.0,12.0,1.0,3.0,150.0] |
|0 |1 |40 |11.5 |9 |2 |80 |[0.0,1.0,40.0,11.5,9.0,2.0,80.0] |
|0 |1 |50 |8.0 |1 |3 |132 |[0.0,1.0,50.0,8.0,1.0,3.0,132.0] |
|0 |1 |50 |8.75 |11 |3 |132 |[0.0,1.0,50.0,8.75,11.0,3.0,132.0]|
|0 |1 |63 |2.75 |3 |3 |20 |[0.0,1.0,63.0,2.75,3.0,3.0,20.0] |
|0 |1 |67 |3.75 |11 |3 |20 |[0.0,1.0,67.0,3.75,11.0,3.0,20.0] |
|0 |2 |23 |11.75|12 |3 |72 |[0.0,2.0,23.0,11.75,12.0,3.0,72.0]|
|0 |2 |24 |9.5 |3 |3 |20 |[0.0,2.0,24.0,9.5,3.0,3.0,20.0] |
|0 |2 |27 |8.75 |2 |1 |6 |[0.0,2.0,27.0,8.75,2.0,1.0,6.0] |
|0 |2 |32 |12.0 |4 |3 |750 |[0.0,2.0,32.0,12.0,4.0,3.0,750.0] |
|0 |2 |34 |11.25|3 |3 |150 |[0.0,2.0,34.0,11.25,3.0,3.0,150.0]|
|0 |2 |34 |12.0 |3 |3 |95 |[0.0,2.0,34.0,12.0,3.0,3.0,95.0] |
|0 |2 |35 |8.5 |6 |3 |100 |[0.0,2.0,35.0,8.5,6.0,3.0,100.0] |
|0 |2 |36 |10.5 |4 |1 |8 |[0.0,2.0,36.0,10.5,4.0,1.0,8.0] |
|1 |1 |15 |3.5 |2 |1 |4 |[1.0,1.0,15.0,3.5,2.0,1.0,4.0] |
+-------------------+---+---+-----+---------------+----+----+----------------------------------+
only showing top 20 rows

root
|-- Result_of_Treatment: integer (nullable = true)
|-- sex: integer (nullable = true)
|-- age: integer (nullable = true)
|-- Time: double (nullable = true)
|-- Number_of_Warts: integer (nullable = true)
|-- Type: integer (nullable = true)
|-- Area: integer (nullable = true)
|-- features: vector (nullable = true)

When, I try to do a transform using this dataframe on the KMeans model, I get the following exception stack trace. Can someone help me understand and resolve why is this occuring?

Minimal repo / logs

org.apache.spark.SparkException : Job aborted due to stage failure: Task 0 in stage 14.0 failed 4 times, most recent failure: Lost task 0.3 in stage 14.0 (TID 16, ip-172-21-87-93.aws.com, executor 1): com.amazonaws.services.sagemakerruntime.model.ModelErrorException: Received client error (400) from hd-kmeans-Model-20191031-184713 with message "unable to evaluate payload provided". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/hd-kmeans-endpoint-20191031-184713 in account 820784505615 for more information. (Service: AmazonSageMakerRuntime; Status Code: 424; Error Code: ModelError; Request ID: 5b811028-e055-4a53-b0f3-3c7df73565ca) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512) at com.amazonaws.services.sagemakerruntime.AmazonSageMakerRuntimeClient.doInvoke(AmazonSageMakerRuntimeClient.java:235) at com.amazonaws.services.sagemakerruntime.AmazonSageMakerRuntimeClient.invoke(AmazonSageMakerRuntimeClient.java:211) at com.amazonaws.services.sagemakerruntime.AmazonSageMakerRuntimeClient.executeInvokeEndpoint(AmazonSageMakerRuntimeClient.java:175) at com.amazonaws.services.sagemakerruntime.AmazonSageMakerRuntimeClient.invokeEndpoint(AmazonSageMakerRuntimeClient.java:151) at com.amazonaws.services.sagemaker.sparksdk.transformation.util.RequestBatchIterator.hasNext(RequestBatchIterator.scala:133) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636) at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:125) at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221) at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335) at org.apache.spark.rdd.RDD.iterator(RDD.scala:286) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: Cause : com.amazonaws.services.sagemakerruntime.model.ModelErrorException: Received client error (400) from hd-kmeans-Model-20191031-184713 with message "unable to evaluate payload provided". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/hd-kmeans-endpoint-20191031-184713 in account 820784505615 for more information. (Service: AmazonSageMakerRuntime; Status Code: 424; Error Code: ModelError; Request ID: 5b811028-e055-4a53-b0f3-3c7df73565ca)

hdamani09 avatar Oct 31 '19 19:10 hdamani09

Driver stacktrace: Cause : com.amazonaws.services.sagemakerruntime.model.ModelErrorException: Received client error (400) from hd-kmeans-Model-20191031-184713 with message "unable to evaluate payload provided". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/hd-kmeans-endpoint-20191031-184713 in account 820784505615 for more information. (Service: AmazonSageMakerRuntime; Status Code: 424; Error Code: ModelError; Request ID: 5b811028-e055-4a53-b0f3-3c7df73565ca)

Could you provide logs of the error from https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/hd-kmeans-endpoint-20191031-184713 in account 820784505615?

nadiaya avatar Nov 06 '19 21:11 nadiaya