`enable_mosaic` incompatibility with Unity Catalog
Bug Description:
Calling enable_mosaic(spark, dbutils) on a UC-enabled cluster with Shared Access Mode throws an error due to this method: https://github.com/databrickslabs/mosaic/blob/5acbc2eeeb93a543c4bc978a381979c4ad44e2c9/python/mosaic/core/library_handler.py#L18
- We're running a Databricks cluster with Access mode set to
Shared, enabling interaction with Unity Catalog tables - Calling
enable_mosaic(spark, dbutils)on this cluster gives a stack-trace to this line & throws apy4j.security.Py4JSecurityException:
Method public void org.apache.spark.api.java.JavaSparkContext.setLogLevel(java.lang.String) is not whitelisted on class class org.apache.spark.api.java.JavaSparkContext
- From my understanding, Unity Catalog requires strict isolation for security features, so setting log level isn't allowed
- We've found mosaic to be a really good fit for spark-based geospatial workloads and would love to use it with Unity Catalog! Any recommendations for a path fwd here?
Steps to Reproduce:
- Create a Databricks Cluster with Access Mode set to
Shared - Attach a notebook to cluster
- Install
mosaicvia Cluster Libraries or%pip install databricks-mosaicin a notebook cell - Import & call
enable_mosaic(spark, dbutils) - Should see the error in Description/Screenshots
Expected behavior:
- Calling
enable_mosaicon aSharedAccess Mode Databricks Cluster shouldn't throw an error
Screenshots:
- Cluster Config:
- Thrown error:
Additional context:
- Let me know if I can provide any! 👍🏾
- Is setting log level necessary for library functionality?
- Does
mosaicinternally call other methods that also aren't allow-listed when using Unity Catalog?
Thank you for reporting this @khalid-dev! Shared access clusters only support Python and SQL languages. Mosaic is written in Scala, with Python and SQL bindings. So when you install it in a shared access cluster it actually does not work because it is trying to execute Scala calls. We are working on white listing it, but for now you need to use "Assigned" access mode with Unity Catalog.
Of course @edurdevic! Appreciate you clarifying the Scala issue - I definitely overlooked that possibility. Curious for my own understanding: "white listing it" = white listing Scala? Or some methods needed for the library?
I also have more details after trying out "Assigned" access mode. I think I've narrowed the issue down to a DLT-UC-specific incompatibility:
- It seems we can query a normal Unity Catalog Table; awesome! ✅
- However, it seems that we can't query Delta Live Tables from an "Assigned" cluster. I've only tried querying streaming tables & materialized views - maybe Views work?
Ideally, we'd like to leverage mosaic's geospatial capabilities in a Delta Live pipelines with Unity Catalog. Seems that the DLT + UC combo restricts us to Shared mode. My current work around is using a Job to:
- "Catalog dance" appropriate tables off Unity Catalog
- Invoke mosaic for geospatial processing on a compatible cluster
- Write results back to Unity Catalog location
Hope this info is helpful - thanks again for maintaining a great library 🥳
@khalid-dev - thanks for the detailed write-up of this issue. I recently ran into the exact same issue. The catalog dance workaround is a creative idea. If we come up with any alternatives, I'll drop a note here. Otherwise, we're keen to see this resolved at a lower level in the stack as our experience w/ Mosaic so far has been very favorable.