SynapseML
SynapseML copied to clipboard
[BUG] 'com.microsoft.azure.synapse.ml.lightgbm' has no attribute 'LightGBMClassificationModel'
SynapseML version
0.10.1
System information
Language version: Python: 3.8.10, Scala 2.12 Spark Version : Apache Spark 3.2.1, Spark Platform: Databricks
Describe the problem
When try to load a pipeline model for lightgbm, I encountered this error message: 'com.microsoft.azure.synapse.ml.lightgbm' has no attribute 'LightGBMClassificationModel'
But I imported from synapse.ml.lightgbm import LightGBMClassificationModel before I try to load the pipeline model
Code to reproduce issue
from pyspark.ml.pipeline import PipelineModel
from synapse.ml.lightgbm import LightGBMClassificationModel, LightGBMClassifier
clf = PipelineModel.load(model_savepath)
Other info / logs
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<command-2087039020756525> in <module>
1 # Load model
2 from pyspark.ml.pipeline import PipelineModel
----> 3 clf = PipelineModel.load(model_savepath)
/databricks/spark/python/pyspark/ml/util.py in load(cls, path)
461 def load(cls, path):
462 """Reads an ML instance from the input path, a shortcut of `read().load(path)`."""
--> 463 return cls.read().load(path)
464
465
/databricks/spark/python/pyspark/ml/pipeline.py in load(self, path)
258 return JavaMLReader(self.cls).load(path)
259 else:
--> 260 uid, stages = PipelineSharedReadWrite.load(metadata, self.sc, path)
261 return PipelineModel(stages=stages)._resetUid(uid)
262
/databricks/spark/python/pyspark/ml/pipeline.py in load(metadata, sc, path)
394 stagePath = \
395 PipelineSharedReadWrite.getStagePath(stageUid, index, len(stageUids), stagesDir)
--> 396 stage = DefaultParamsReader.loadParamsInstance(stagePath, sc)
397 stages.append(stage)
398 return (metadata['uid'], stages)
/databricks/spark/python/pyspark/ml/util.py in loadParamsInstance(path, sc)
719 else:
720 pythonClassName = metadata['class'].replace("org.apache.spark", "pyspark")
--> 721 py_type = DefaultParamsReader.__get_class(pythonClassName)
722 instance = py_type.load(path)
723 return instance
/databricks/spark/python/pyspark/ml/util.py in __get_class(clazz)
630 m = __import__(module)
631 for comp in parts[1:]:
--> 632 m = getattr(m, comp)
633 return m
634
AttributeError: module 'com.microsoft.azure.synapse.ml.lightgbm' has no attribute 'LightGBMClassificationModel'
What component(s) does this bug affect?
- [ ]
area/cognitive
: Cognitive project - [ ]
area/core
: Core project - [ ]
area/deep-learning
: DeepLearning project - [X]
area/lightgbm
: Lightgbm project - [ ]
area/opencv
: Opencv project - [ ]
area/vw
: VW project - [ ]
area/website
: Website - [ ]
area/build
: Project build system - [ ]
area/notebooks
: Samples under notebooks folder - [ ]
area/docker
: Docker usage - [ ]
area/models
: models related issue
What language(s) does this bug affect?
- [ ]
language/scala
: Scala source code - [X]
language/python
: Pyspark APIs - [ ]
language/r
: R APIs - [ ]
language/csharp
: .NET APIs - [ ]
language/new
: Proposals for new client languages
What integration(s) does this bug affect?
- [ ]
integrations/synapse
: Azure Synapse integrations - [ ]
integrations/azureml
: Azure ML integrations - [ ]
integrations/databricks
: Databricks integrations
Hey @sibyl1956 :wave:! Thank you so much for reporting the issue/feature request :rotating_light:. Someone from SynapseML Team will be looking to triage this issue soon. We appreciate your patience.
@svotaw -- could you take a look at this issue ? Thanks !
Can you give more context here? How did you save the model? What was the code to create the original Pipeline?
Having the same issue. Here's the code i used to train and save the model:
from synapse.ml.lightgbm import LightGBMRegressor
from synapse.ml.train import TrainedRegressorModel
from pyspark.ml.pipeline import PipelineModel
model = TrainRegressor(
model=LightGBMRegressor(**model_params),
inputCols=features,
labelCol=target
)
trained_model = model.fit(df_train)
trained_model.getModel().save('trained_model_pipeline')
loaded_model = PipelineModel.load('trained_model_pipeline')
Running that last line gives me the same error as the OP. Running on SynapseML 0.11.1, PySpark 3.2.3.
I can save the TrainedregressorModel
and use TrainedRegressorModel.load
to load the model correctly, but using PipelineModel.load
seems like a more general solution to loading models and I would prefer using that.
Here is an anecdotal experience, whatever it is worth:
I had the same problem and was able to get the pipeline to load by flattening the pipeline stages. It was erroring when my first stage in the pipeline was itself a pipeline of feature transformations. When I removed this nested pipeline structure I was able to load the saved pipeline.
For a pyspark.ml.Pipeline where all stages were java stages (estimators and transformers that come from the spark MLlib library) the model could be saved and read without problems.
WORKS:
pipe = Pipeline(
stages=[
SomePysparkMLibTransformer, # is an instance of the JavaMLWritable
LightGBMClassifier(**model_params),
]
)
The error occurred when one of the transformers were a custom and not a java stage.
DOESN'T WORK:
pipe = Pipeline(
stages=[
SomeCustomTransformer, # is NOT an instance of the JavaMLWritable
LightGBMClassifier(**model_params),
]
)
In this case the PipelineModel.write method returned a non java writer. The classes synapse.ml.lightgbm.LightGBMClassifier and synapse.ml.lightgbm.LightGBMRegressor inherit correct java reader (pyspark.ml.util.JavaMLReadable) and writer (pyspark.ml.util.JavaMLWritable). The problem is with the superclass synapse.ml.core.schema.Utils.ComplexParamsMixin that inherits only from the pyspark.ml.util.MLReadable.
I could bypass the problem by wrapping the estimator with the pyspark.ml.Pipeline. In this situation the write method of the last stage will return the JavaMLWriter not the PipelineModelWriter.
pipe = Pipeline(
stages=[
SomeCustomTransformer, # is NOT an instance of the JavaMLWritable
Pipeline(
stages=[
LightGBMClassifier(**model_params),
]
)
]
)