streaming icon indicating copy to clipboard operation
streaming copied to clipboard

Azure Databricks MDS write ops in error: MapInPandas write_mds gives message Spark higher-order functions are not supported in Unity Catalog

Open wolliq opened this issue 10 months ago • 2 comments

Environment

Azure Databricks 14.3 LTS used on a distributed cluster.

To reproduce

Steps to reproduce the behavior:

  1. Code as follow:
d = {
    'code': ['test', 'test', 'test'], 
    'age': ['test', 'test', 'test'], 
    'id': ['ib4', '7h!', '67h']
}

df_test = ps.DataFrame(data=d).to_spark()

mds_kwargs = {
    'out': 'test', 
    'columns': {
        'code': 'str', 
        'age': 'str',
        'id': 'str',
    },
    'keep_local': True
}

dataframe_to_mds(
    dataframe=df_test,
    merge_index=True,
    mds_kwargs=mds_kwargs
)

Expected behavior

The script should result in a write op that writes the created DataFrame on the Unity Catalog / local persistence in the MDS format.

Additional context

Documentation is not accessible today, we are using version 0.7.x .

wolliq avatar Apr 15 '24 13:04 wolliq

This is not a bug on the streaming side most likely. I seen it before.

Can you try using a non server less compute?

XiaohanZhangCMU avatar Apr 15 '24 15:04 XiaohanZhangCMU

Hi @XiaohanZhangCMU , sorry I'm not sure I understand. Could you please be more specific, what do you mean by a non server less compute ? I used a regular DBK distributed cluster.

wolliq avatar Apr 22 '24 10:04 wolliq