SynapseML
SynapseML copied to clipboard
Failed to Load DataConversion
Describe the bug
com.microsoft.azure.synapse.ml.featurize.DataConversion
doesn't implement read(). Saving works fine. This doesn't work when used on its own (DataConversion().load()
), and also doesn't work when used with MLlib's Pipeline/PipelineModel.
To Reproduce
Minimally reproducable:
import pyspark
from synapse.ml.featurize import DataConversion
spark = (
pyspark.sql.SparkSession.builder.master("local[*]")
.appName("App")
.config(
"spark.jars.packages",
"org.apache.hadoop:hadoop-aws:3.3.1,com.microsoft.azure:synapseml_2.12:0.9.5",
)
.getOrCreate()
)
path = "data_conversion.stage"
stage = DataConversion(cols=["input"], convertTo="string")
stage.save(path)
DataConversion().load(path)
Expected behavior
- Expect that we should be able to save and load DataConversion.
Info (please complete the following information):
- SynapseML Version: 0.9.5
- Spark Version: 3.2.1
- Spark Platform: Standalone
** Stacktrace**
22/02/03 18:47:00 ERROR Instrumentation: java.lang.NoSuchMethodException: com.microsoft.azure.synapse.ml.featurize.DataConversion.read()
at java.lang.Class.getMethod(Class.java:1786)
at org.apache.spark.ml.util.DefaultParamsReader$.loadParamsInstanceReader(ReadWrite.scala:631)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$4(Pipeline.scala:276)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$3(Pipeline.scala:274)
at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:268)
at org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$7(Pipeline.scala:356)
at org.apache.spark.ml.MLEvents.withLoadInstanceEvent(events.scala:160)
at org.apache.spark.ml.MLEvents.withLoadInstanceEvent$(events.scala:155)
at org.apache.spark.ml.util.Instrumentation.withLoadInstanceEvent(Instrumentation.scala:42)
at org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$6(Pipeline.scala:355)
at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:355)
at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:349)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:748)
I think because load is a static method it should be
DataConversion.load(path)
LMK if this fixes you and feel free to re-open if not
@mhamilton723 I just tried it with DataConversion.load(path)
and it's still giving me this error: com.microsoft.azure.synapse.ml.featurize.DataConversion.read does not exist in the JVM
. The DataConversion load also doesn't work when used in a Pipeline, so it seems like an implementation mistake on DataConversion?
We test this as part of the build but will take a look. Thanks for bringing this back up!
@mhamilton723 Really appreciate it! Please let me know if there's any way I can help.
@mhamilton723 Sorry for pushing, but this is really blocking for us. Any updates, or any way I can help? Thanks..