mleap key not found: org.apache.spark.ml.feature.ImputerModel

trafficstars

According to the doc, imputer is supported. But I get this error when trying to save bundlefile. Here are my dependency versions: ''spark 2.2.0'' "ml.combust.mleap" %% "mleap-runtime" % "0.9.5", "ml.combust.mleap" %% "mleap-spark" % "0.9.5" I don't know what to do, can u help me, please~

Exception in thread "main" java.util.NoSuchElementException: key not found: org.apache.spark.ml.feature.ImputerModel
	at scala.collection.MapLike$class.default(MapLike.scala:228)
	at scala.collection.AbstractMap.default(Map.scala:59)
	at scala.collection.MapLike$class.apply(MapLike.scala:141)
	at scala.collection.AbstractMap.apply(Map.scala:59)
	at ml.combust.bundle.BundleRegistry.opForObj(BundleRegistry.scala:84)
	at ml.combust.bundle.serializer.GraphSerializer$$anonfun$writeNode$1.apply(GraphSerializer.scala:31)
	at ml.combust.bundle.serializer.GraphSerializer$$anonfun$writeNode$1.apply(GraphSerializer.scala:30)
	at scala.util.Try$.apply(Try.scala:192)
	at ml.combust.bundle.serializer.GraphSerializer.writeNode(GraphSerializer.scala:30)
	at ml.combust.bundle.serializer.GraphSerializer$$anonfun$write$2.apply(GraphSerializer.scala:21)
	at ml.combust.bundle.serializer.GraphSerializer$$anonfun$write$2.apply(GraphSerializer.scala:21)
	at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
	at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
	at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
	at ml.combust.bundle.serializer.GraphSerializer.write(GraphSerializer.scala:20)
	at org.apache.spark.ml.bundle.ops.PipelineOp$$anon$1.store(PipelineOp.scala:21)
	at org.apache.spark.ml.bundle.ops.PipelineOp$$anon$1.store(PipelineOp.scala:14)
	at ml.combust.bundle.serializer.ModelSerializer$$anonfun$write$1.apply(ModelSerializer.scala:87)
	at ml.combust.bundle.serializer.ModelSerializer$$anonfun$write$1.apply(ModelSerializer.scala:83)
	at scala.util.Try$.apply(Try.scala:192)
	at ml.combust.bundle.serializer.ModelSerializer.write(ModelSerializer.scala:83)
	at ml.combust.bundle.serializer.NodeSerializer$$anonfun$write$1.apply(NodeSerializer.scala:85)
	at ml.combust.bundle.serializer.NodeSerializer$$anonfun$write$1.apply(NodeSerializer.scala:81)
	at scala.util.Try$.apply(Try.scala:192)
	at ml.combust.bundle.serializer.NodeSerializer.write(NodeSerializer.scala:81)
	at ml.combust.bundle.serializer.BundleSerializer$$anonfun$write$1.apply(BundleSerializer.scala:34)
	at ml.combust.bundle.serializer.BundleSerializer$$anonfun$write$1.apply(BundleSerializer.scala:29)
	at scala.util.Try$.apply(Try.scala:192)
	at ml.combust.bundle.serializer.BundleSerializer.write(BundleSerializer.scala:29)
	at ml.combust.bundle.BundleWriter.save(BundleWriter.scala:26)
	at com.zhihu.saturn.offline.process.CalculateLRModel$$anonfun$training$2.apply(CalculateLRModel.scala:161)
	at com.zhihu.saturn.offline.process.CalculateLRModel$$anonfun$training$2.apply(CalculateLRModel.scala:160)
	at resource.AbstractManagedResource$$anonfun$5.apply(AbstractManagedResource.scala:88)
	at scala.util.control.Exception$Catch$$anonfun$either$1.apply(Exception.scala:125)
	at scala.util.control.Exception$Catch$$anonfun$either$1.apply(Exception.scala:125)
	at scala.util.control.Exception$Catch.apply(Exception.scala:103)
	at scala.util.control.Exception$Catch.either(Exception.scala:125)
	at resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88)
	at resource.ManagedResourceOperations$class.apply(ManagedResourceOperations.scala:26)
	at resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50)
	at resource.ManagedResourceOperations$class.acquireAndGet(ManagedResourceOperations.scala:25)
	at resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:50)
	at resource.ManagedResourceOperations$class.foreach(ManagedResourceOperations.scala:53)
	at resource.AbstractManagedResource.foreach(AbstractManagedResource.scala:50)
	at com.zhihu.saturn.offline.process.CalculateLRModel$.training(CalculateLRModel.scala:160)
	at com.zhihu.saturn.offline.process.CalculateLRModel$.run(CalculateLRModel.scala:51)
	at com.zhihu.saturn.offline.process.CalculateLRModel$.main(CalculateLRModel.scala:225)
	at com.zhihu.saturn.offline.process.CalculateLRModel.main(CalculateLRModel.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Apr 03 '18 13:04 caesarjuly

@caesarjuly Try adding "ml.combust.mleap" %% "mleap-spark-extension" % "0.9.5" as a dependency and use

import org.apache.spark.ml.mleap.feature.Imputer

from there instead.

I believe that the out of the box Spark transformer can work on multiple columns and that isn't supported at the moment in MLeap. The transformer from mleap-spark-extension works the same as Spark's, with the additional restriction that it works on just a single column.

Apr 03 '18 13:04 ancasarb

@ancasarb Thank u very much. That should be the key. I'll try it later. There is another question. What's the relationship between mleap-spark and mleap-spark-extension? Which one should I use?

Apr 03 '18 15:04 caesarjuly

Btw, after reading and using this project. I really want to participate in it. Is there any way to join? I feel that there are many features waited to be added.

Apr 03 '18 15:04 caesarjuly

Any developments on this issue?

May 21 '18 13:05 gabtibe

@gabtibe please see the answer above, about using the Imputer from mleap-extensions. Let me know if you have any questions!

May 21 '18 21:05 ancasarb

@ancasarb Thanks for your response; I did use the Imputer from spark-extension but was wondering if there's any plan to support the standard Imputer from Spark as I noticed it speeds up computation and reduce the number of steps in the pipeline to be saved

May 26 '18 13:05 gabtibe

+1 for support of Spark ImputerModel, makes exporting existing pipelines much easier.

Jun 01 '18 19:06 kevinykuo

I have the same issue working with Pyspark 2.3.0. mleap-spark-extension are not available in python from what I saw.

Oct 03 '18 10:10 benoua

I am very confused about the mleap's support for spark Imputer.

Why doesn't the documentation mention that the Imputer in spark is only supported when using an object from mleap-spark-extension? The Supported transformers table mentions that Imputer is supported (without any explanation).

Do I understand correctly that it's not possible to instantiate the Imputer when creating a pipeline in pyspark?

EDIT: I have asked a SO question about the above https://stackoverflow.com/questions/71209926/mleap-support-spark-ml-imputer as well

Feb 21 '22 16:02 botchniaque

mleap mleap copied to clipboard

key not found: org.apache.spark.ml.feature.ImputerModel

mleap
mleap copied to clipboard