SynapseML
SynapseML copied to clipboard
java.util.NoSuchElementException: Param binSampleCount does not exist on LightGBM
Describe the bug binSampleCount does not exist
Instantiation of LightGBM
model = LightGBMClassifier(featuresCol="idfFeatures", labelCol="label")
pipeline_cv_lr = Pipeline().setStages(
[nltk_cleaner, count_vectorizer, idf, model]
)
model_cv_lr = pipeline_cv_lr.fit(train_data)
predictions_cv_lr = model_cv_lr.transform(test_data)
Info (please complete the following information):
- MMLSpark Version: mmlspark_2.11;1.0.0-rc3
- Spark Version e.g. 2.4.5
- Spark Platform: Local Jupyter
** Error**
Py4JJavaError: An error occurred while calling o80.getParam.
: java.util.NoSuchElementException: Param binSampleCount does not exist.
at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:729)
at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:729)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.ml.param.Params$class.getParam(params.scala:728)
at org.apache.spark.ml.PipelineStage.getParam(Pipeline.scala:42)
at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it.
@sarmientoj24 sorry about the trouble you are having. This looks very strange, because this parameter definitely does exist in lightgbm. I wonder if you somehow have multiple environments installed, or your python/pyspark code is somehow not in sync with the scala/Java code. How did you install the library in your local jupyter notebook?
@imatiach-msft it is installed like this
session = SparkSession.builder.appName("person-classifier").config("spark.jars.packages", "com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc3").config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven").getOrCreate()
spark_context = sql.SQLContext(session)
@sarmientoj24 very strange, it should just work. What kind of environment/cluster are you running on? Maybe this way doesn't work there? It is possible there are multiple mmlspark versions installed there somehow?
@imatiach-msft I met a similar error as above, and I am thinking this is due to installation issues. I have spark 3.1.2 in my local jupyter notebook, and I followed the python installation method as in this official website: (https://microsoft.github.io/SynapseML/)
I use this code for installation as I use 3.1.2 spark:
spark = SparkSession.builder.appName("MyApp") \ .config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:0.9.5-13-d1b51517-SNAPSHOT") \ .config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven") \ .getOrCreate()
But if failed when I run "import synapse", it said no module found. I also tried in Kaggle notebook and GCP Dataproc, both had same error. Ultimately, I had to use pip install synapseml, and pip install synapse so that I can run the LightGBM model and import this module.
Could you please help me with this installation issue if possible? Thanks a lot!
My error is shown below (I one-hot encoded categorical features and then create vectorAssembler on numerical features and onehot encoded categorical features, my label is multiclass)
> ---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
/tmp/ipykernel_213/1295349156.py in <module>
----> 1 model = model.fit(updated_train)
/opt/conda/lib/python3.7/site-packages/pyspark/ml/base.py in fit(self, dataset, params)
159 return self.copy(params)._fit(dataset)
160 else:
--> 161 return self._fit(dataset)
162 else:
163 raise ValueError("Params must be either a param map or a list/tuple of param maps, "
/opt/conda/lib/python3.7/site-packages/synapse/ml/lightgbm/LightGBMClassifier.py in _fit(self, dataset)
2015
2016 def _fit(self, dataset):
-> 2017 java_model = self._fit_java(dataset)
2018 return self._create_model(java_model)
2019
/opt/conda/lib/python3.7/site-packages/pyspark/ml/wrapper.py in _fit_java(self, dataset)
329 fitted Java model
330 """
--> 331 self._transfer_params_to_java()
332 return self._java_obj.fit(dataset._jdf)
333
/opt/conda/lib/python3.7/site-packages/synapse/ml/core/schema/Utils.py in _transfer_params_to_java(self)
132 self._java_obj.set(pair)
133 if self.hasDefault(param):
--> 134 pair = self._make_java_param_pair(param, self._defaultParamMap[param])
135 pair_defaults.append(pair)
136 if len(pair_defaults) > 0:
/opt/conda/lib/python3.7/site-packages/synapse/ml/core/serialize/java_params_patch.py in _mml_make_java_param_pair(self, param, value)
85 sc = SparkContext._active_spark_context
86 param = self._resolveParam(param)
---> 87 java_param = self._java_obj.getParam(param.name)
88 java_value = _mml_py2java(sc, value)
89 return java_param.w(java_value)
/opt/conda/lib/python3.7/site-packages/py4j/java_gateway.py in __call__(self, *args)
1303 answer = self.gateway_client.send_command(command)
1304 return_value = get_return_value(
-> 1305 answer, self.gateway_client, self.target_id, self.name)
1306
1307 for temp_arg in temp_args:
/opt/conda/lib/python3.7/site-packages/pyspark/sql/utils.py in deco(*a, **kw)
109 def deco(*a, **kw):
110 try:
--> 111 return f(*a, **kw)
112 except py4j.protocol.Py4JJavaError as e:
113 converted = convert_exception(e.java_exception)
/opt/conda/lib/python3.7/site-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
--> 328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
Py4JJavaError: An error occurred while calling o1001.getParam.
: java.util.NoSuchElementException: Param catSmooth does not exist.
at org.apache.spark.ml.param.Params.$anonfun$getParam$2(params.scala:705)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.ml.param.Params.getParam(params.scala:705)
at org.apache.spark.ml.param.Params.getParam$(params.scala:703)
at org.apache.spark.ml.PipelineStage.getParam(Pipeline.scala:41)
at jdk.internal.reflect.GeneratedMethodAccessor70.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:829)