mosaic icon indicating copy to clipboard operation
mosaic copied to clipboard

`enable_mosaic` incompatibility with Unity Catalog

Open khalid-dev opened this issue 2 years ago • 3 comments

Bug Description:

Calling enable_mosaic(spark, dbutils) on a UC-enabled cluster with Shared Access Mode throws an error due to this method: https://github.com/databrickslabs/mosaic/blob/5acbc2eeeb93a543c4bc978a381979c4ad44e2c9/python/mosaic/core/library_handler.py#L18

  • We're running a Databricks cluster with Access mode set to Shared, enabling interaction with Unity Catalog tables
  • Calling enable_mosaic(spark, dbutils) on this cluster gives a stack-trace to this line & throws a py4j.security.Py4JSecurityException:
Method public void org.apache.spark.api.java.JavaSparkContext.setLogLevel(java.lang.String) is not whitelisted on class class org.apache.spark.api.java.JavaSparkContext
  • From my understanding, Unity Catalog requires strict isolation for security features, so setting log level isn't allowed
  • We've found mosaic to be a really good fit for spark-based geospatial workloads and would love to use it with Unity Catalog! Any recommendations for a path fwd here?

Steps to Reproduce:

  1. Create a Databricks Cluster with Access Mode set to Shared
  2. Attach a notebook to cluster
  3. Install mosaic via Cluster Libraries or %pip install databricks-mosaic in a notebook cell
  4. Import & call enable_mosaic(spark, dbutils)
  5. Should see the error in Description/Screenshots

Expected behavior:

  • Calling enable_mosaic on a Shared Access Mode Databricks Cluster shouldn't throw an error

Screenshots:

  • Cluster Config: Screen Shot 2023-04-26 at 12 55 53 PM
  • Thrown error: Screen Shot 2023-04-26 at 12 54 57 PM

Additional context:

  • Let me know if I can provide any! 👍🏾
  • Is setting log level necessary for library functionality?
  • Does mosaic internally call other methods that also aren't allow-listed when using Unity Catalog?

khalid-dev avatar Apr 26 '23 20:04 khalid-dev

Thank you for reporting this @khalid-dev! Shared access clusters only support Python and SQL languages. Mosaic is written in Scala, with Python and SQL bindings. So when you install it in a shared access cluster it actually does not work because it is trying to execute Scala calls. We are working on white listing it, but for now you need to use "Assigned" access mode with Unity Catalog.

edurdevic avatar May 04 '23 10:05 edurdevic

Of course @edurdevic! Appreciate you clarifying the Scala issue - I definitely overlooked that possibility. Curious for my own understanding: "white listing it" = white listing Scala? Or some methods needed for the library?

I also have more details after trying out "Assigned" access mode. I think I've narrowed the issue down to a DLT-UC-specific incompatibility:

  • It seems we can query a normal Unity Catalog Table; awesome! ✅ Screen Shot 2023-05-05 at 12 38 39 PM Screen Shot 2023-05-05 at 12 38 51 PM
  • However, it seems that we can't query Delta Live Tables from an "Assigned" cluster. I've only tried querying streaming tables & materialized views - maybe Views work? Screen Shot 2023-05-05 at 12 47 59 PM Screen Shot 2023-05-05 at 12 47 03 PM Screen Shot 2023-05-05 at 12 38 59 PM

Ideally, we'd like to leverage mosaic's geospatial capabilities in a Delta Live pipelines with Unity Catalog. Seems that the DLT + UC combo restricts us to Shared mode. My current work around is using a Job to:

  1. "Catalog dance" appropriate tables off Unity Catalog
  2. Invoke mosaic for geospatial processing on a compatible cluster
  3. Write results back to Unity Catalog location Screen Shot 2023-05-05 at 12 55 07 PM

Hope this info is helpful - thanks again for maintaining a great library 🥳

khalid-dev avatar May 05 '23 18:05 khalid-dev

@khalid-dev - thanks for the detailed write-up of this issue. I recently ran into the exact same issue. The catalog dance workaround is a creative idea. If we come up with any alternatives, I'll drop a note here. Otherwise, we're keen to see this resolved at a lower level in the stack as our experience w/ Mosaic so far has been very favorable.

kyleries avatar Jul 19 '23 15:07 kyleries