sagemaker-sparkml-serving-container
sagemaker-sparkml-serving-container copied to clipboard
Cannot use vector as input struct type due to: java.lang.ClassCastException: scala.collection.convert.Wrappers$JListWrapper cannot be cast to ml.combust.mleap.tensor.Tensor
I am trying to deploy bundled a Spark ML NaiveBayesModel with sagemaker-sparkml-serving-container.
I am running sagemaker-sparkml-serving-container with following command:
SCHEMA='{"input":[{"name":"features","type":"double","struct":"vector"}],"output":{"name":"prediction","type":"double"}}'
BUNDLE=/tmp/naivebayes_bundle
docker run -p 8080:8080 -e SAGEMAKER_SPARKML_SCHEMA="$SCHEMA" -v $BUNDLE:/opt/ml/model sagemaker-sparkml-serving:2.2 serve
When calling /invocations with:
curl -i -H "content-type:application/json" http://localhost:8080/invocations -d '{"data":[[1.0,2.0,3.0]]}'
Following error is thrown:
java.lang.ClassCastException: scala.collection.convert.Wrappers$JListWrapper cannot be cast to ml.combust.mleap.tensor.Tensor
at ml.combust.mleap.runtime.transformer.classification.NaiveBayesClassifier$$anonfun$1.apply(NaiveBayesClassifier.scala:19) ~[sparkml-serving-2.2.jar:2.2]
at ml.combust.mleap.runtime.frame.Row$class.udfValue(Row.scala:241) ~[sparkml-serving-2.2.jar:2.2]
at ml.combust.mleap.runtime.frame.ArrayRow.udfValue(ArrayRow.scala:17) ~[sparkml-serving-2.2.jar:2.2]
at ml.combust.mleap.runtime.frame.Row$class.withValues(Row.scala:225) ~[sparkml-serving-2.2.jar:2.2]
at ml.combust.mleap.runtime.frame.ArrayRow.withValues(ArrayRow.scala:17) ~[sparkml-serving-2.2.jar:2.2]
at ml.combust.mleap.runtime.frame.DefaultLeapFrame$$anonfun$withColumns$1$$anonfun$apply$3$$anonfun$4.apply(DefaultLeapFrame.scala:79) ~[sparkml-serving-2.2.jar:2.2]
at ml.combust.mleap.runtime.frame.DefaultLeapFrame$$anonfun$withColumns$1$$anonfun$apply$3$$anonfun$4.apply(DefaultLeapFrame.scala:79) ~[sparkml-serving-2.2.jar:2.2]
at scala.collection.immutable.Stream.map(Stream.scala:418) ~[sparkml-serving-2.2.jar:2.2]
at ml.combust.mleap.runtime.frame.DefaultLeapFrame$$anonfun$withColumns$1$$anonfun$apply$3.apply(DefaultLeapFrame.scala:79) ~[sparkml-serving-2.2.jar:2.2]
at ml.combust.mleap.runtime.frame.DefaultLeapFrame$$anonfun$withColumns$1$$anonfun$apply$3.apply(DefaultLeapFrame.scala:78) ~[sparkml-serving-2.2.jar:2.2]
at scala.util.Success$$anonfun$map$1.apply(Try.scala:237) ~[sparkml-serving-2.2.jar:2.2]
at scala.util.Try$.apply(Try.scala:192) ~[sparkml-serving-2.2.jar:2.2]
at scala.util.Success.map(Try.scala:237) ~[sparkml-serving-2.2.jar:2.2]
at ml.combust.mleap.runtime.frame.DefaultLeapFrame$$anonfun$withColumns$1.apply(DefaultLeapFrame.scala:77) ~[sparkml-serving-2.2.jar:2.2]
at ml.combust.mleap.runtime.frame.DefaultLeapFrame$$anonfun$withColumns$1.apply(DefaultLeapFrame.scala:72) ~[sparkml-serving-2.2.jar:2.2]
at scala.util.Success.flatMap(Try.scala:231) ~[sparkml-serving-2.2.jar:2.2]
at ml.combust.mleap.runtime.frame.DefaultLeapFrame.withColumns(DefaultLeapFrame.scala:71) ~[sparkml-serving-2.2.jar:2.2]
at ml.combust.mleap.runtime.frame.MultiTransformer$class.transform(Transformer.scala:121) ~[sparkml-serving-2.2.jar:2.2]
at ml.combust.mleap.runtime.transformer.classification.NaiveBayesClassifier.transform(NaiveBayesClassifier.scala:13) ~[sparkml-serving-2.2.jar:2.2]
at com.amazonaws.sagemaker.utils.ScalaUtils.transformLeapFrame(ScalaUtils.java:44) ~[sparkml-serving-2.2.jar:2.2]
at com.amazonaws.sagemaker.controller.ServingController.processInputData(ServingController.java:176) ~[sparkml-serving-2.2.jar:2.2]
at com.amazonaws.sagemaker.controller.ServingController.transformRequestJson(ServingController.java:118) ~[sparkml-serving-2.2.jar:2.2]
Created bundle with following dependencies:
org.apache.spark:spark-core_2.11:2.4.0
org.apache.spark:spark-mllib_2.11:2.4.0
ml.combust.mleap:mleap-spark_2.11:0.12.0
Kotlin code that creates the bundle:
val model = NaiveBayes()
.setModelType("multinomial")
.fit(data)
SimpleSparkSerializer().serializeToBundle(model, "file:/tmp/naivebayes_bundle", model.transform(data))
Hey, thanks for using the sagemaker-sparkml-serving. From the stack-trace you mentioned, it looks like for some reason, your model is returning output of type Array instead of a single value.
Please change the schema to output an Array instead of a single value and see if it gives you a valid output. You may need to extract some information out from the response depending on your underlying use-case.
Schema should be changed like this:
SCHEMA='{"input":[{"name":"features","type":"double","struct":"vector"}],"output":{"name":"prediction","type":"double",struct:"array"}}'
Thanks for fast response. Your suggestion doesn't fix the problem. It throws exactly the same exception and stack trace.
It seems that input data features are given as JListWrapper for prediction instead of Tensor.
https://github.com/combust/mleap/blob/master/mleap-runtime/src/main/scala/ml/combust/mleap/runtime/transformer/classification/NaiveBayesClassifier.scala#L19
It looks like your bundle is created with Spark 2.4 and MLeap 0.12.0. At this point, MLeap does not support beyond Spark 2.3 and this container is only tested with Spark 2.2 and MLeap 0.9.6.
As NaiveBayes is available in Spark 2.2 as well, it'll be easier for me to replicate if you can switch to Spark 2.2.1, MLeap 0.9.6 and try to reproduce the same error again.
@make I encountered the same problem you did and after a lot of debugging I figured it out. It has nothing to do with the version of Spark or MLeap, it is produced because inside DataConversionHelper the function convertInputDataToJavaType assumes that whenever the DataStructureType is not empty or BASIC, it will be an array.
Therefore, the code as it is right now, will never create a Vector and will not work with any pipeline that requires as an entry point a Vector (such as any trained estimator that requires features). I fixed the code and will try to create a pull request over the weekend.
@jorgeglezlopez Hi there, I'm using MLeap 0.14.0 with Spark 2.4.3. I deployed a model to sagemaker endpoint and still am facing the same issue. Do you know by when will the changes with the updater Docker image for 2.4 support will be pushed? Thanks
There is a fix for this that should be merged into master https://github.com/aws/sagemaker-sparkml-serving-container/pull/11
I have been trying to use the latest code here and getting similar error.
Commands.
git clone https://github.com/aws/sagemaker-sparkml-serving-container.git
cd sagemaker-sparkml-serving-container
docker build -t sagemaker-sparkml-serving:2.4 .
docker run -p 8080:8080 -e SAGEMAKER_SPARKML_SCHEMA='{"input":[{"name":"features","type":"double","struct":"vector"}],"output":{"name":"probability","type":"double","struct":"vector"}}' -v /Users/prasprak/mldocker/open_models/mleap_model/tar/logreg/:/opt/ml/model sagemaker-sparkml-serving:2.4 serve
Note: My input is of type vector and output is also of type vector.
For Invocations.
curl -i -H "Accept: application/jsonlines;data=text" -H "content-type:application/json" -d "{"data":[[-1.0, 1.5, 1.2]]}" http://localhost:8080/invocations
java.lang.ClassCastException: scala.collection.convert.Wrappers$JListWrapper cannot be cast to ml.combust.mleap.tensor.Tensor at ml.combust.mleap.runtime.transformer.classification.LogisticRegression$$anonfun$1.apply(LogisticRegression.scala:19) ~[sparkml-serving-2.4.jar:2.4] at ml.combust.mleap.runtime.frame.Row$class.udfValue(Row.scala:241) ~[sparkml-serving-2.4.jar:2.4]
Also i see the fix which is done here is not merged to master. I tried to pull the branch where the fix is provided but getting different error with that.