datacompy SparkSQLCompare does not work on Databricks serverless

Got this error when running on serverless:

[NOT_SUPPORTED_WITH_SERVERLESS] PERSIST TABLE is not supported on serverless compute. SQLSTATE: 0A000 File , line 1 ----> 1 compare = SparkSQLCompare( 2 spark, 3 old, 4 new, 5 join_columns=['CurrencyExchangeRateDate','SourceCurrencyCode', 'TargetCurrencyCode'], # You can also specify a list of columns 6 df1_name='old', # Optional, defaults to 'df1' 7 df2_name='new' # Optional, defaults to 'df2' 8 ) 10 # This method prints out a human-readable report summarizing and sampling differences 11 print(compare.report())

Apr 24 '25 13:04 scardella

@scardella Thanks for the issue. I believe this was a known issue with databricks and its serverless option: https://docs.databricks.com/aws/en/compute/serverless/limitations

Given it doesn't support the full suite of options I'm not sure this is something we can support or tweak. Maybe there is a case for a special databricks comparer? I don't have databricks access so I can't really test this out.

Apr 24 '25 15:04 fdosani

Given the issue, all you really need is to eliminate some cache() or persist() type statements. OR give the end user the capacity to toggle that.

Apr 24 '25 19:04 scardella

Given the issue, all you really need is to eliminate some cache() or persist() type statements. OR give the end user the capacity to toggle that.

If you are able to test this out on your end and submit a PR I'd be able to consider/review that. Like I mentioned I don't have databricks serverless access.

Apr 24 '25 19:04 fdosani