SynapseML
SynapseML copied to clipboard
[BUG] Databricks 14.3 LTS usage of internal _jvm variable is no longer supported
SynapseML version
com.microsoft.azure:synapseml_2.12:1.0.2
System information
- Spark Version 3.5.0
- Platform: Azure Databricks
- Operating System: Ubuntu 22.04.3 LTS
- Java: Zulu 8.74.0.17-CA-linux64
- Scala: 2.12.15
- Python: 3.10.12
Describe the problem
There is no longer a dependency on the JVM when querying Apache Spark and as a consequence, internal APIs related to the JVM, such as _jsc, _jconf, _jvm, _jsparkSession, _jreader, _jc, _jseq, _jdf, _jmap, and _jcols are no longer supported.
https://github.com/microsoft/SynapseML/blob/fa9ba2eac6ea5e219dcae0f0025ef2ca9313a081/cognitive/src/main/python/synapse/ml/services/search/AzureSearchWriter.py#L18
Code to reproduce issue
from pyspark.sql.functions import lit, col, date_format, to_json
from synapse.ml.services import writeToAzureSearch
df = spark.read.table(index_delta_table)
df2 = df.select(
col("id").alias("id"),
col("subject").alias("subject"),
lit("mergeOrUpload").alias("action")
)
writeToAzureSearch(df2,
subscriptionKey=ai_search_key,
actionCol="action",
serviceName=ai_search_name,
indexName=index_name,
batchSize='1000',
keyCol="id"
)
Other info / logs
Py4JJavaError: An error occurred while calling z:com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter.write.
: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(Lorg/apache/spark/sql/types/StructType;)Lorg/apache/spark/sql/catalyst/encoders/ExpressionEncoder;
at com.microsoft.azure.synapse.ml.core.schema.SparkBindings.rowEnc$lzycompute(SparkBindings.scala:17)
at com.microsoft.azure.synapse.ml.core.schema.SparkBindings.rowEnc(SparkBindings.scala:17)
at com.microsoft.azure.synapse.ml.core.schema.SparkBindings.makeFromRowConverter(SparkBindings.scala:26)
at com.microsoft.azure.synapse.ml.io.http.ErrorUtils$.addErrorUDF(SimpleHTTPTransformer.scala:57)
at com.microsoft.azure.synapse.ml.io.http.SimpleHTTPTransformer.$anonfun$makePipeline$1(SimpleHTTPTransformer.scala:135)
at org.apache.spark.injections.UDFUtils$$anon$1.call(UDFUtils.scala:23)
at org.apache.spark.sql.functions$.$anonfun$udf$91(functions.scala:8103)
at com.microsoft.azure.synapse.ml.stages.Lambda.$anonfun$transform$1(Lambda.scala:55)
at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logVerb(SynapseMLLogging.scala:163)
at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logVerb$(SynapseMLLogging.scala:160)
at com.microsoft.azure.synapse.ml.stages.Lambda.logVerb(Lambda.scala:24)
at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logTransform(SynapseMLLogging.scala:157)
at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logTransform$(SynapseMLLogging.scala:156)
at com.microsoft.azure.synapse.ml.stages.Lambda.logTransform(Lambda.scala:24)
at com.microsoft.azure.synapse.ml.stages.Lambda.transform(Lambda.scala:56)
at com.microsoft.azure.synapse.ml.stages.Lambda.transformSchema(Lambda.scala:64)
at org.apache.spark.ml.PipelineModel.$anonfun$transformSchema$5(Pipeline.scala:317)
at scala.collection.IndexedSeqOptimized.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.IndexedSeqOptimized.foldLeft$(IndexedSeqOptimized.scala:68)
at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:198)
at org.apache.spark.ml.PipelineModel.transformSchema(Pipeline.scala:317)
at com.microsoft.azure.synapse.ml.io.http.SimpleHTTPTransformer.transformSchema(SimpleHTTPTransformer.scala:170)
at org.apache.spark.ml.PipelineModel.$anonfun$transformSchema$5(Pipeline.scala:317)
at scala.collection.IndexedSeqOptimized.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.IndexedSeqOptimized.foldLeft$(IndexedSeqOptimized.scala:68)
at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:198)
at org.apache.spark.ml.PipelineModel.transformSchema(Pipeline.scala:317)
at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:72)
at org.apache.spark.ml.PipelineModel.$anonfun$transform$2(Pipeline.scala:310)
at org.apache.spark.ml.MLEvents.withTransformEvent(events.scala:148)
at org.apache.spark.ml.MLEvents.withTransformEvent$(events.scala:141)
at org.apache.spark.ml.util.Instrumentation.withTransformEvent(Instrumentation.scala:45)
at org.apache.spark.ml.PipelineModel.$anonfun$transform$1(Pipeline.scala:309)
at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:289)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:289)
at org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:308)
at com.microsoft.azure.synapse.ml.services.CognitiveServicesBaseNoHandler.$anonfun$transform$1(CognitiveServiceBase.scala:548)
at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logVerb(SynapseMLLogging.scala:163)
at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logVerb$(SynapseMLLogging.scala:160)
at com.microsoft.azure.synapse.ml.services.CognitiveServicesBaseNoHandler.logVerb(CognitiveServiceBase.scala:495)
at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logTransform(SynapseMLLogging.scala:157)
at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logTransform$(SynapseMLLogging.scala:156)
at com.microsoft.azure.synapse.ml.services.CognitiveServicesBaseNoHandler.logTransform(CognitiveServiceBase.scala:495)
at com.microsoft.azure.synapse.ml.services.CognitiveServicesBaseNoHandler.transform(CognitiveServiceBase.scala:548)
at com.microsoft.azure.synapse.ml.services.search.AddDocuments.super$transform(AzureSearch.scala:137)
at com.microsoft.azure.synapse.ml.services.search.AddDocuments.$anonfun$transform$1(AzureSearch.scala:137)
at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logVerb(SynapseMLLogging.scala:163)
at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logVerb$(SynapseMLLogging.scala:160)
at com.microsoft.azure.synapse.ml.services.CognitiveServicesBaseNoHandler.logVerb(CognitiveServiceBase.scala:495)
at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logTransform(SynapseMLLogging.scala:157)
at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logTransform$(SynapseMLLogging.scala:156)
at com.microsoft.azure.synapse.ml.services.CognitiveServicesBaseNoHandler.logTransform(CognitiveServiceBase.scala:495)
at com.microsoft.azure.synapse.ml.services.search.AddDocuments.transform(AzureSearch.scala:138)
at com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter$.prepareDF(AzureSearch.scala:308)
at com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter$.write(AzureSearch.scala:432)
at com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter$.write(AzureSearch.scala:440)
at com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter.write(AzureSearch.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
File <command-2049793672345868>, line 25
5 df = spark.read.table(index_delta_table)
7 df2 = df.select(
8 col("id").alias("id"),
9 col("subject").alias("subject"),
(...)
22 lit("mergeOrUpload").alias("action")
23 )
---> 25 writeToAzureSearch(df2,
26 subscriptionKey=ai_search_key,
27 actionCol="action",
28 serviceName=ai_search_name,
29 indexName=index_name,
30 batchSize='1000',
31 keyCol="id"
32 )
File /local_disk0/spark/userFiles/com_microsoft_azure_synapseml_cognitive_2_12_1_0_2.jar/synapse/ml/services/search/AzureSearchWriter.py:28, in writeToAzureSearch(df, **options)
26 jvm = SparkContext.getOrCreate()._jvm
27 writer = jvm.com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter
---> 28 writer.write(df._jdf, options)
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
1316 command = proto.CALL_COMMAND_NAME +\
1317 self.command_header +\
1318 args_command +\
1319 proto.END_COMMAND_PART
1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
1323 answer, self.gateway_client, self.target_id, self.name)
1325 for temp_arg in temp_args:
1326 if hasattr(temp_arg, "_detach"):
File /databricks/spark/python/pyspark/errors/exceptions/captured.py:224, in capture_sql_exception.<locals>.deco(*a, **kw)
222 def deco(*a: Any, **kw: Any) -> Any:
223 try:
--> 224 return f(*a, **kw)
225 except Py4JJavaError as e:
226 converted = convert_exception(e.java_exception)
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
332 format(target_id, ".", name, value))
What component(s) does this bug affect?
- [ ]
area/cognitive: Cognitive project - [X]
area/core: Core project - [ ]
area/deep-learning: DeepLearning project - [ ]
area/lightgbm: Lightgbm project - [ ]
area/opencv: Opencv project - [ ]
area/vw: VW project - [ ]
area/website: Website - [ ]
area/build: Project build system - [ ]
area/notebooks: Samples under notebooks folder - [ ]
area/docker: Docker usage - [ ]
area/models: models related issue
What language(s) does this bug affect?
- [X]
language/scala: Scala source code - [X]
language/python: Pyspark APIs - [ ]
language/r: R APIs - [ ]
language/csharp: .NET APIs - [ ]
language/new: Proposals for new client languages
What integration(s) does this bug affect?
- [ ]
integrations/synapse: Azure Synapse integrations - [ ]
integrations/azureml: Azure ML integrations - [X]
integrations/databricks: Databricks integrations