datacompy icon indicating copy to clipboard operation
datacompy copied to clipboard

SparkSQLCompare does not work on Databricks serverless

Open scardella opened this issue 8 months ago • 5 comments

Got this error when running on serverless:

[NOT_SUPPORTED_WITH_SERVERLESS] PERSIST TABLE is not supported on serverless compute. SQLSTATE: 0A000 File , line 1 ----> 1 compare = SparkSQLCompare( 2 spark, 3 old, 4 new, 5 join_columns=['CurrencyExchangeRateDate','SourceCurrencyCode', 'TargetCurrencyCode'], # You can also specify a list of columns 6 df1_name='old', # Optional, defaults to 'df1' 7 df2_name='new' # Optional, defaults to 'df2' 8 ) 10 # This method prints out a human-readable report summarizing and sampling differences 11 print(compare.report())

scardella avatar Apr 24 '25 13:04 scardella

@scardella Thanks for the issue. I believe this was a known issue with databricks and its serverless option: https://docs.databricks.com/aws/en/compute/serverless/limitations

Given it doesn't support the full suite of options I'm not sure this is something we can support or tweak. Maybe there is a case for a special databricks comparer? I don't have databricks access so I can't really test this out.

fdosani avatar Apr 24 '25 15:04 fdosani

Given the issue, all you really need is to eliminate some cache() or persist() type statements. OR give the end user the capacity to toggle that.

scardella avatar Apr 24 '25 19:04 scardella

Given the issue, all you really need is to eliminate some cache() or persist() type statements. OR give the end user the capacity to toggle that.

If you are able to test this out on your end and submit a PR I'd be able to consider/review that. Like I mentioned I don't have databricks serverless access.

fdosani avatar Apr 24 '25 19:04 fdosani