sagemaker-sparkml-serving-container icon indicating copy to clipboard operation
sagemaker-sparkml-serving-container copied to clipboard

Cannot use vector as input struct type due to: java.lang.ClassCastException: scala.collection.convert.Wrappers$JListWrapper cannot be cast to ml.combust.mleap.tensor.Tensor

Open make opened this issue 6 years ago • 7 comments

I am trying to deploy bundled a Spark ML NaiveBayesModel with sagemaker-sparkml-serving-container.

I am running sagemaker-sparkml-serving-container with following command:

SCHEMA='{"input":[{"name":"features","type":"double","struct":"vector"}],"output":{"name":"prediction","type":"double"}}'
BUNDLE=/tmp/naivebayes_bundle
docker run -p 8080:8080 -e SAGEMAKER_SPARKML_SCHEMA="$SCHEMA" -v $BUNDLE:/opt/ml/model sagemaker-sparkml-serving:2.2 serve

When calling /invocations with:

curl -i -H "content-type:application/json" http://localhost:8080/invocations -d '{"data":[[1.0,2.0,3.0]]}'

Following error is thrown:

java.lang.ClassCastException: scala.collection.convert.Wrappers$JListWrapper cannot be cast to ml.combust.mleap.tensor.Tensor
	at ml.combust.mleap.runtime.transformer.classification.NaiveBayesClassifier$$anonfun$1.apply(NaiveBayesClassifier.scala:19) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.Row$class.udfValue(Row.scala:241) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.ArrayRow.udfValue(ArrayRow.scala:17) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.Row$class.withValues(Row.scala:225) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.ArrayRow.withValues(ArrayRow.scala:17) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.DefaultLeapFrame$$anonfun$withColumns$1$$anonfun$apply$3$$anonfun$4.apply(DefaultLeapFrame.scala:79) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.DefaultLeapFrame$$anonfun$withColumns$1$$anonfun$apply$3$$anonfun$4.apply(DefaultLeapFrame.scala:79) ~[sparkml-serving-2.2.jar:2.2]
	at scala.collection.immutable.Stream.map(Stream.scala:418) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.DefaultLeapFrame$$anonfun$withColumns$1$$anonfun$apply$3.apply(DefaultLeapFrame.scala:79) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.DefaultLeapFrame$$anonfun$withColumns$1$$anonfun$apply$3.apply(DefaultLeapFrame.scala:78) ~[sparkml-serving-2.2.jar:2.2]
	at scala.util.Success$$anonfun$map$1.apply(Try.scala:237) ~[sparkml-serving-2.2.jar:2.2]
	at scala.util.Try$.apply(Try.scala:192) ~[sparkml-serving-2.2.jar:2.2]
	at scala.util.Success.map(Try.scala:237) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.DefaultLeapFrame$$anonfun$withColumns$1.apply(DefaultLeapFrame.scala:77) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.DefaultLeapFrame$$anonfun$withColumns$1.apply(DefaultLeapFrame.scala:72) ~[sparkml-serving-2.2.jar:2.2]
	at scala.util.Success.flatMap(Try.scala:231) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.DefaultLeapFrame.withColumns(DefaultLeapFrame.scala:71) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.MultiTransformer$class.transform(Transformer.scala:121) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.transformer.classification.NaiveBayesClassifier.transform(NaiveBayesClassifier.scala:13) ~[sparkml-serving-2.2.jar:2.2]
	at com.amazonaws.sagemaker.utils.ScalaUtils.transformLeapFrame(ScalaUtils.java:44) ~[sparkml-serving-2.2.jar:2.2]
	at com.amazonaws.sagemaker.controller.ServingController.processInputData(ServingController.java:176) ~[sparkml-serving-2.2.jar:2.2]
	at com.amazonaws.sagemaker.controller.ServingController.transformRequestJson(ServingController.java:118) ~[sparkml-serving-2.2.jar:2.2]

Created bundle with following dependencies:

org.apache.spark:spark-core_2.11:2.4.0
org.apache.spark:spark-mllib_2.11:2.4.0
ml.combust.mleap:mleap-spark_2.11:0.12.0

Kotlin code that creates the bundle:

val model = NaiveBayes()
        .setModelType("multinomial")
        .fit(data)
SimpleSparkSerializer().serializeToBundle(model, "file:/tmp/naivebayes_bundle", model.transform(data))

make avatar Nov 30 '18 15:11 make

Hey, thanks for using the sagemaker-sparkml-serving. From the stack-trace you mentioned, it looks like for some reason, your model is returning output of type Array instead of a single value.

Please change the schema to output an Array instead of a single value and see if it gives you a valid output. You may need to extract some information out from the response depending on your underlying use-case.

Schema should be changed like this:

SCHEMA='{"input":[{"name":"features","type":"double","struct":"vector"}],"output":{"name":"prediction","type":"double",struct:"array"}}'

orchidmajumder avatar Dec 01 '18 03:12 orchidmajumder

Thanks for fast response. Your suggestion doesn't fix the problem. It throws exactly the same exception and stack trace.

It seems that input data features are given as JListWrapper for prediction instead of Tensor. https://github.com/combust/mleap/blob/master/mleap-runtime/src/main/scala/ml/combust/mleap/runtime/transformer/classification/NaiveBayesClassifier.scala#L19

make avatar Dec 03 '18 06:12 make

It looks like your bundle is created with Spark 2.4 and MLeap 0.12.0. At this point, MLeap does not support beyond Spark 2.3 and this container is only tested with Spark 2.2 and MLeap 0.9.6.

As NaiveBayes is available in Spark 2.2 as well, it'll be easier for me to replicate if you can switch to Spark 2.2.1, MLeap 0.9.6 and try to reproduce the same error again.

orchidmajumder avatar Dec 03 '18 16:12 orchidmajumder

@make I encountered the same problem you did and after a lot of debugging I figured it out. It has nothing to do with the version of Spark or MLeap, it is produced because inside DataConversionHelper the function convertInputDataToJavaType assumes that whenever the DataStructureType is not empty or BASIC, it will be an array.

Therefore, the code as it is right now, will never create a Vector and will not work with any pipeline that requires as an entry point a Vector (such as any trained estimator that requires features). I fixed the code and will try to create a pull request over the weekend.

jorgeglezlopez avatar Oct 18 '19 20:10 jorgeglezlopez

@jorgeglezlopez Hi there, I'm using MLeap 0.14.0 with Spark 2.4.3. I deployed a model to sagemaker endpoint and still am facing the same issue. Do you know by when will the changes with the updater Docker image for 2.4 support will be pushed? Thanks

hdamani09 avatar Nov 04 '19 08:11 hdamani09

There is a fix for this that should be merged into master https://github.com/aws/sagemaker-sparkml-serving-container/pull/11

timf-bonobos avatar Sep 23 '20 18:09 timf-bonobos

I have been trying to use the latest code here and getting similar error.

Commands.

git clone https://github.com/aws/sagemaker-sparkml-serving-container.git

cd sagemaker-sparkml-serving-container

docker build -t sagemaker-sparkml-serving:2.4 .

docker run -p 8080:8080 -e SAGEMAKER_SPARKML_SCHEMA='{"input":[{"name":"features","type":"double","struct":"vector"}],"output":{"name":"probability","type":"double","struct":"vector"}}' -v /Users/prasprak/mldocker/open_models/mleap_model/tar/logreg/:/opt/ml/model sagemaker-sparkml-serving:2.4 serve

Note: My input is of type vector and output is also of type vector.

For Invocations.

curl -i -H "Accept: application/jsonlines;data=text" -H "content-type:application/json" -d "{"data":[[-1.0, 1.5, 1.2]]}" http://localhost:8080/invocations

java.lang.ClassCastException: scala.collection.convert.Wrappers$JListWrapper cannot be cast to ml.combust.mleap.tensor.Tensor at ml.combust.mleap.runtime.transformer.classification.LogisticRegression$$anonfun$1.apply(LogisticRegression.scala:19) ~[sparkml-serving-2.4.jar:2.4] at ml.combust.mleap.runtime.frame.Row$class.udfValue(Row.scala:241) ~[sparkml-serving-2.4.jar:2.4]

Also i see the fix which is done here is not merged to master. I tried to pull the branch where the fix is provided but getting different error with that.

prashantprakash avatar Dec 28 '20 20:12 prashantprakash