azure-cosmosdb-spark icon indicating copy to clipboard operation
azure-cosmosdb-spark copied to clipboard

Cannot install on Databricks 11.3 LTS (Spark 3.3.0

Open lotsahelp opened this issue 2 years ago • 2 comments

I'm trying the azure-cosmos-spark_3-3_2-12 (v4.15.0) connector from maven and it never finishes installing via maven. I have tried downloading the jar from maven and installing manually. It takes a few minutes to upload / install, but I'm left with the below message each time I try to call Cosmos. Changing back to Databricks 10.4 LTS w/ the 3-2_2-12 connector works fine.

---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
<command-725919012830118> in <cell line: 2>()
      1 ##CREATE Container and Database
----> 2 spark.sql(f'CREATE DATABASE IF NOT EXISTS cosmosCatalog.{cosmosDatabaseName};')
      3 
      4 spark.sql(
      5     f"""

/databricks/spark/python/pyspark/instrumentation_utils.py in wrapper(*args, **kwargs)
     46             start = time.perf_counter()
     47             try:
---> 48                 res = func(*args, **kwargs)
     49                 logger.log_success(
     50                     module_name, class_name, function_name, time.perf_counter() - start, signature

/databricks/spark/python/pyspark/sql/session.py in sql(self, sqlQuery, **kwargs)
   1117             sqlQuery = formatter.format(sqlQuery, **kwargs)
   1118         try:
-> 1119             return DataFrame(self._jsparkSession.sql(sqlQuery), self)
   1120         finally:
   1121             if len(kwargs) > 0:

/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1319 
   1320         answer = self.gateway_client.send_command(command)
-> 1321         return_value = get_return_value(
   1322             answer, self.gateway_client, self.target_id, self.name)
   1323 

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
    200                 # Hide where the exception came from that shows a non-Pythonic
    201                 # JVM exception message.
--> 202                 raise converted from None
    203             else:
    204                 raise

AnalysisException: Catalog 'cosmoscatalog' not found

lotsahelp avatar Jan 09 '23 20:01 lotsahelp

The error "Catalog 'cosmoscatalog' not found" indicates that the Spark Catalog with identifier "cosmoscatalog" has not been configured.

It would be done by adding the following entries to the Spark config spark.conf.set("spark.sql.catalog.cosmosCatalog", "com.azure.cosmos.spark.CosmosCatalog") spark.conf.set("spark.sql.catalog.cosmosCatalog.spark.cosmos.accountEndpoint", "<YourAccountKey>") spark.conf.set("spark.sql.catalog.cosmosCatalog.spark.cosmos.accountKey", "<YourMasterKey>")

From the behavior you describe it might be possible that the Spark 3.2 cluster has these spark config settings defined in the cluster config - so would be applied at start-up - while the Spark 3.3 doesn't have those settings?

Thanks, Fabian

FabianMeiswinkel avatar Jan 09 '23 22:01 FabianMeiswinkel

@FabianMeiswinkel those three lines are in the cell above.

lotsahelp avatar Jan 10 '23 00:01 lotsahelp