spark-iforest icon indicating copy to clipboard operation
spark-iforest copied to clipboard

issue with saving the model with python on windows

Open MohamadSabha opened this issue 3 years ago • 7 comments

I would really like to thank you for this amazing work.

however, I'm trying to save the model so I will be able to use it, later with spark streaming but somehow I'm not able to do this and I'm having an error saying "The default jsonEncode only supports string, vector, and matrix. org.apache.spark.ml.param.Param must override jsonEncode for java.lang.Double"

here is my code :

temp_path = tempfile.mkdtemp() iforest_path = temp_path + "\iforest" iforest_model.save(iforest_path)

I would really appreciate it if you can help, thanks in advance.

MohamadSabha avatar May 28 '22 16:05 MohamadSabha

Are you using Spark version 3.0+ ?

Maybe you can take a look at this issue, which might help. https://github.com/titicaca/spark-iforest/issues/36

titicaca avatar May 30 '22 03:05 titicaca

I really appreciate your response sir. yes, actually, I'm using spark 3.2, I already had a look at the post you mentioned but unfortunately, I couldn't figure out which param I should change, also I considered to pull the last version of the code but I already did earlier. also, you mentioned in this answer : https://stackoverflow.com/a/56849894

that you already fixed the problem and rewrite the code in the master branch I already had a look also and it's the same code as I'm using, any suggestions?? I think I'm missing something here.

MohamadSabha avatar May 31 '22 13:05 MohamadSabha

which branch are you using? The master branch is tested on spark 2.4.

titicaca avatar Jun 01 '22 01:06 titicaca

I just merge the new updates from master into spark3 branch. You can try with branch spark3 if you are using spark version 3.2

titicaca avatar Jun 01 '22 01:06 titicaca

Sir, thanks a lot for your help I really appreciate it, I was able to save the model finally but unfortunately, I'm not alb to load the model and I'm facing an error that says.

"Py4JJavaError: An error occurred while calling o542.load. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 59.0 failed 1 times"

Also, while saving the model there's an error occurred saying that "ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'" but even though the model is getting saved, I'm not sure if the saving progress is going in the right way or if there's something going wrong.

I was trying to solve the 2 errors for the past few days that's why I didn't answer your comment but unfortunately, I couldn't. I believe that there's a miss-compatibility problem between my spark and python version and forest version so I think I will need to train the model again and again every time I want to test or run my streaming application I'm really out of options.

I would like to ask if there's also any way to evaluate the model because I think I couldn't find such a method in the implementation, I would be really thankful for it. ,

note: my data is an accelerometer data coming in real-time from a mobile phone, and the main idea of the work is to identify outliers in the streaming application using spark, apache Kafka, and forest algorithm.

thanks a lot sir for your amazing efforts again I can't thank you enough.

MohamadSabha avatar Jun 05 '22 16:06 MohamadSabha

No problem. There might be some compatibility problems, because it hasn't been fully tested on Spark 3.2. I will look into it when I have time.

For the model evaluation, you can refer to the example

titicaca avatar Jun 09 '22 03:06 titicaca

thanks a lot for your efforts.

MohamadSabha avatar Jun 09 '22 15:06 MohamadSabha