mleap
mleap copied to clipboard
key not found: org.apache.spark.ml.feature.ImputerModel
According to the doc, imputer is supported. But I get this error when trying to save bundlefile. Here are my dependency versions: ''spark 2.2.0'' "ml.combust.mleap" %% "mleap-runtime" % "0.9.5", "ml.combust.mleap" %% "mleap-spark" % "0.9.5" I don't know what to do, can u help me, please~
Exception in thread "main" java.util.NoSuchElementException: key not found: org.apache.spark.ml.feature.ImputerModel
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:59)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at scala.collection.AbstractMap.apply(Map.scala:59)
at ml.combust.bundle.BundleRegistry.opForObj(BundleRegistry.scala:84)
at ml.combust.bundle.serializer.GraphSerializer$$anonfun$writeNode$1.apply(GraphSerializer.scala:31)
at ml.combust.bundle.serializer.GraphSerializer$$anonfun$writeNode$1.apply(GraphSerializer.scala:30)
at scala.util.Try$.apply(Try.scala:192)
at ml.combust.bundle.serializer.GraphSerializer.writeNode(GraphSerializer.scala:30)
at ml.combust.bundle.serializer.GraphSerializer$$anonfun$write$2.apply(GraphSerializer.scala:21)
at ml.combust.bundle.serializer.GraphSerializer$$anonfun$write$2.apply(GraphSerializer.scala:21)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
at ml.combust.bundle.serializer.GraphSerializer.write(GraphSerializer.scala:20)
at org.apache.spark.ml.bundle.ops.PipelineOp$$anon$1.store(PipelineOp.scala:21)
at org.apache.spark.ml.bundle.ops.PipelineOp$$anon$1.store(PipelineOp.scala:14)
at ml.combust.bundle.serializer.ModelSerializer$$anonfun$write$1.apply(ModelSerializer.scala:87)
at ml.combust.bundle.serializer.ModelSerializer$$anonfun$write$1.apply(ModelSerializer.scala:83)
at scala.util.Try$.apply(Try.scala:192)
at ml.combust.bundle.serializer.ModelSerializer.write(ModelSerializer.scala:83)
at ml.combust.bundle.serializer.NodeSerializer$$anonfun$write$1.apply(NodeSerializer.scala:85)
at ml.combust.bundle.serializer.NodeSerializer$$anonfun$write$1.apply(NodeSerializer.scala:81)
at scala.util.Try$.apply(Try.scala:192)
at ml.combust.bundle.serializer.NodeSerializer.write(NodeSerializer.scala:81)
at ml.combust.bundle.serializer.BundleSerializer$$anonfun$write$1.apply(BundleSerializer.scala:34)
at ml.combust.bundle.serializer.BundleSerializer$$anonfun$write$1.apply(BundleSerializer.scala:29)
at scala.util.Try$.apply(Try.scala:192)
at ml.combust.bundle.serializer.BundleSerializer.write(BundleSerializer.scala:29)
at ml.combust.bundle.BundleWriter.save(BundleWriter.scala:26)
at com.zhihu.saturn.offline.process.CalculateLRModel$$anonfun$training$2.apply(CalculateLRModel.scala:161)
at com.zhihu.saturn.offline.process.CalculateLRModel$$anonfun$training$2.apply(CalculateLRModel.scala:160)
at resource.AbstractManagedResource$$anonfun$5.apply(AbstractManagedResource.scala:88)
at scala.util.control.Exception$Catch$$anonfun$either$1.apply(Exception.scala:125)
at scala.util.control.Exception$Catch$$anonfun$either$1.apply(Exception.scala:125)
at scala.util.control.Exception$Catch.apply(Exception.scala:103)
at scala.util.control.Exception$Catch.either(Exception.scala:125)
at resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88)
at resource.ManagedResourceOperations$class.apply(ManagedResourceOperations.scala:26)
at resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50)
at resource.ManagedResourceOperations$class.acquireAndGet(ManagedResourceOperations.scala:25)
at resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:50)
at resource.ManagedResourceOperations$class.foreach(ManagedResourceOperations.scala:53)
at resource.AbstractManagedResource.foreach(AbstractManagedResource.scala:50)
at com.zhihu.saturn.offline.process.CalculateLRModel$.training(CalculateLRModel.scala:160)
at com.zhihu.saturn.offline.process.CalculateLRModel$.run(CalculateLRModel.scala:51)
at com.zhihu.saturn.offline.process.CalculateLRModel$.main(CalculateLRModel.scala:225)
at com.zhihu.saturn.offline.process.CalculateLRModel.main(CalculateLRModel.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
@caesarjuly Try adding "ml.combust.mleap" %% "mleap-spark-extension" % "0.9.5" as a dependency and use
import org.apache.spark.ml.mleap.feature.Imputer
from there instead.
I believe that the out of the box Spark transformer can work on multiple columns and that isn't supported at the moment in MLeap. The transformer from mleap-spark-extension works the same as Spark's, with the additional restriction that it works on just a single column.
@ancasarb Thank u very much. That should be the key. I'll try it later. There is another question. What's the relationship between mleap-spark and mleap-spark-extension? Which one should I use?
Btw, after reading and using this project. I really want to participate in it. Is there any way to join? I feel that there are many features waited to be added.
Any developments on this issue?
@gabtibe please see the answer above, about using the Imputer from mleap-extensions. Let me know if you have any questions!
@ancasarb Thanks for your response; I did use the Imputer from spark-extension but was wondering if there's any plan to support the standard Imputer from Spark as I noticed it speeds up computation and reduce the number of steps in the pipeline to be saved
+1 for support of Spark ImputerModel, makes exporting existing pipelines much easier.
I have the same issue working with Pyspark 2.3.0. mleap-spark-extension are not available in python from what I saw.
I am very confused about the mleap's support for spark Imputer.
Why doesn't the documentation mention that the Imputer in spark is only supported when using an object from mleap-spark-extension? The Supported transformers table mentions that Imputer is supported (without any explanation).
Do I understand correctly that it's not possible to instantiate the Imputer when creating a pipeline in pyspark?
EDIT: I have asked a SO question about the above https://stackoverflow.com/questions/71209926/mleap-support-spark-ml-imputer as well