pypmml-spark icon indicating copy to clipboard operation
pypmml-spark copied to clipboard

TypeError: 'JavaPackage' object is not callable error despite linking jars into spark succesfully

Open NatMzk opened this issue 3 years ago • 7 comments

I have run the link_pmml4s_jars_into_spark.py script succesfully image

and pmml4s jar files are present in SPARK_HOME location image

However, TypeError: 'JavaPackage' object is not callable still occurs image

I am running Java Version=1.8.0_302 and Spark Version=3.2.1.

I would kindly appreciate any suggestion what is missing.

NatMzk avatar Jun 07 '22 05:06 NatMzk

@NatMzk The ScoreModel.fromFile() expects a local pathname of the model, could you use other methods like fromBytes or fromString to load the model? so first you should read the model from the path dbfs:/... by yourself.

scorebot avatar Jun 08 '22 01:06 scorebot

from my understanding dbfs path is databricks's local path where the XML pmml model is located. I tried using fromBytes and fromString methods but it results with the same error.

NatMzk avatar Jun 08 '22 06:06 NatMzk

@NatMzk Could you provide the full stack of the exception above? and try restarting the kernel before load model

scorebot avatar Jun 08 '22 09:06 scorebot

I restarted kernel by Deattaching & Reattaching notebook with no results. Error trace is as following:

image

I am running Databricks Runtime Version 10.4 LTS on single node cluster (not pure spark). Apache Spark=3.2.1 Java Version=1.8.0_302 (Azul Systems, Inc.)

NatMzk avatar Jun 08 '22 09:06 NatMzk

I don't have the Databricks Runtime, but when I remove the links created by the script link_pmml4s_jars_into_spark.py, I can reproduce the same error on my side, so I guess your issue could be caused by the same reason that those dependent jars of pmml4s are not found by Spark, there some several ways to try:

For details about the following configurations, see the official doc: https://spark.apache.org/docs/latest/configuration.html

All those ones can be specified by the conf file or the command line, check the doc for your eivironment. Take the command line to launch pyspark as an example:

  1. set spark.jars,
pyspark --conf spark.jars="$(echo /Path/To/pypmml_spark/jars/*.jar | tr ' ' ',')"
  1. set spark.jars.packages
pyspark --conf spark.jars.packages=org.pmml4s:pmml4s_2.12:0.9.16,org.pmml4s:pmml4s-spark_2.12:0.9.16,io.spray:spray-json_2.12:1.3.5,org.apache.commons:commons-math3:3.6.1
  1. set spark.driver.extraClassPath and spark.executor.extraClassPath
pyspark --conf spark.driver.extraClassPath="/Path/To/pypmml_spark/jars/*" --conf spark.executor.extraClassPath="/Path/To/pypmml_spark/jars/*"

Recommend the options 1 and 2

scorebot avatar Jun 10 '22 08:06 scorebot

@NatMzk Did the methods above resolve your issue?

scorebot avatar Jun 14 '22 01:06 scorebot

Another relatively simple way for Databricks is to copy the jar files to /databricks/jars, for example in a cluster install script.

jrauch-pros avatar Nov 12 '24 13:11 jrauch-pros