spark-knn icon indicating copy to clipboard operation
spark-knn copied to clipboard

Model save support

Open davis-varghese opened this issue 7 years ago • 5 comments

I saved a model(KNNClassificationModel) using java serialization and when I use it later, I always get java.lang.IllegalArgumentException: Flat hash tables cannot contain null elements. on the dataframe output of the model.transform(inputDataFrame).

Is there a better way of saving and using model? like support for MLWritable/Saveable traits. In our use case, we create a model and use it later

davis-varghese avatar Aug 23 '16 02:08 davis-varghese

I am also looking for solution to save model using scala spark

mindcrusher11 avatar Sep 05 '16 12:09 mindcrusher11

I also had this problem,does it any solution for it?

Sambor123 avatar Mar 06 '17 06:03 Sambor123

I've tried like this: sc.parallelize(Seq(knnModel), 1).saveAsObjectFile("/user/you/knnTest/" + "KNN") val model = sc.objectFile[KNNClassificationModel]("/user/you/knnTest/" + "KNN").first()

but the model pulled back in no longer seems to work, which is strange since this has worked for all my other models.

rachmaninovquartet avatar Jun 27 '17 22:06 rachmaninovquartet

I also encountered this problem. I attempted to serialize this model and load again. but rdd[tree] cannot be deserialized correctly. it looks like that metricTree have some problem. if you have sollution,comment please

wzjmail avatar Jul 03 '20 06:07 wzjmail

Also saving using PipelineModel.save() does not work:

Caused by: java.lang.UnsupportedOperationException: Pipeline write will fail on this Pipeline because it contains a stage which does not implement Writable. Non-Writable stage: knnc_95d9ce15f990 of type class org.apache.spark.ml.classification.KNNClassificationModel
at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:231)
at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:228)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.validateStages(Pipeline.scala:228)
at org.apache.spark.ml.PipelineModel$PipelineModelWriter.<init>(Pipeline.scala:336)
at org.apache.spark.ml.PipelineModel.write(Pipeline.scala:320)
at org.apache.spark.ml.util.MLWritable$class.save(ReadWrite.scala:306)
at org.apache.spark.ml.PipelineModel.save(Pipeline.scala:293)
... 16 more

alexnb avatar Aug 11 '20 12:08 alexnb