ibis icon indicating copy to clipboard operation
ibis copied to clipboard

Config Value spark.sql.mapKeyDedupPolicy not Supported by Databricks SQL Warehouse

Open ArtnerC opened this issue 1 year ago • 1 comments

Getting the error spark.sql.mapKeyDedupPolicy is not supported by Databricks SQL Warehouses when using ibis pyspark with a Databricks SQL Warehouse Cluster.

See: https://community.databricks.com/t5/data-engineering/spark-settings-in-sql-warehouse/td-p/7959

Set in do_connnect: https://github.com/ibis-project/ibis/blame/e425ad57899f8ebbea29b57bb53cedb40ebd7193/ibis/backends/pyspark/init.py#L180

self._session.conf.set("spark.sql.mapKeyDedupPolicy", "LAST_WIN")

Workaround could be as simple as:

try:
    spark.conf.set("spark.sql.mapKeyDedupPolicy", "LAST_WIN")
except Exception as e:
    if "not available" in str(e):
        print("Likely running in a SQL Warehouse")
    else:
        raise e  # Re-raise other exceptions

but I'm not sure what other approaches there might be.

ArtnerC avatar May 30 '24 20:05 ArtnerC

Thanks for opening this @ArtnerC! I think we'd happily accept a PR with the fix you suggest if you're interested in submitting one. I think we can skip checking for a specific exception and just ignore any exceptions on that line:

try:
    spark.conf.set("spark.sql.mapKeyDedupPolicy", "LAST_WIN")
except Exception:  # is there a specific exception class we could catch instead of just `Exception`?
    pass

Re: general databricks support, we don't have any immediate plans to set up a databricks testing environment (or a databricks-specific backend if needed), but if it's possible to make things work with just our existing pyspark backend, we'd happily continue to accept bugfixes towards making that work.

jcrist avatar May 31 '24 18:05 jcrist

I tried to run this again to get the exact error class that was throwing but it just.. worked this time.

Nevertheless, the Databricks docs say pretty clearly that job runs using unsupported properties will fail (https://docs.databricks.com/en/release-notes/serverless.html#version-202415), so I've submitted the pr.

ArtnerC avatar Aug 13 '24 21:08 ArtnerC

I was incorrect and the error resurfaced. I was able to track it to a SparkConnectGrpcException and implemented the specific exception check. I also will fall back to a more generic PySparkException since SparkConnect is somewhat new. The latest on this has been tested and fixes this error.

ArtnerC avatar Aug 22 '24 16:08 ArtnerC